butterfree.hooks.schema_compatibility package

Submodules

Cassandra table schema compatibility Hook definition.

class butterfree.hooks.schema_compatibility.cassandra_table_schema_compatibility_hook.CassandraTableSchemaCompatibilityHook(cassandra_client: CassandraClient, table: str)

Bases: Hook

Hook to verify the schema compatibility with a Cassandra’s table.

Verifies if all columns presented on the dataframe exists and are the same type on the target Cassandra’s table.

cassandra_client

client to connect to Cassandra DB.

table

table name.

run(dataframe: DataFrame) DataFrame

Check the schema compatibility from a given Dataframe.

This method does not change anything on the Dataframe.

Parameters:

dataframe – dataframe to verify schema compatibility.

Returns:

unchanged dataframe.

Raises:

ValueError if the schemas are incompatible.

Spark table schema compatibility Hook definition.

class butterfree.hooks.schema_compatibility.spark_table_schema_compatibility_hook.SparkTableSchemaCompatibilityHook(spark_client: SparkClient, table: str, database: Optional[str] = None)

Bases: Hook

Hook to verify the schema compatibility with a Spark’s table.

Verifies if all columns presented on the dataframe exists and are the same type on the target Spark’s table.

spark_client

client to connect to Spark’s metastore.

table

table name.

database

database name.

run(dataframe: DataFrame) DataFrame

Check the schema compatibility from a given Dataframe.

This method does not change anything on the Dataframe.

Parameters:

dataframe – dataframe to verify schema compatibility.

Returns:

unchanged dataframe.

Raises:

ValueError if the schemas are incompatible.

Module contents

Holds Schema Compatibility Hooks definitions.

class butterfree.hooks.schema_compatibility.CassandraTableSchemaCompatibilityHook(cassandra_client: CassandraClient, table: str)

Bases: Hook

Hook to verify the schema compatibility with a Cassandra’s table.

Verifies if all columns presented on the dataframe exists and are the same type on the target Cassandra’s table.

cassandra_client

client to connect to Cassandra DB.

table

table name.

run(dataframe: DataFrame) DataFrame

Check the schema compatibility from a given Dataframe.

This method does not change anything on the Dataframe.

Parameters:

dataframe – dataframe to verify schema compatibility.

Returns:

unchanged dataframe.

Raises:

ValueError if the schemas are incompatible.

class butterfree.hooks.schema_compatibility.SparkTableSchemaCompatibilityHook(spark_client: SparkClient, table: str, database: Optional[str] = None)

Bases: Hook

Hook to verify the schema compatibility with a Spark’s table.

Verifies if all columns presented on the dataframe exists and are the same type on the target Spark’s table.

spark_client

client to connect to Spark’s metastore.

table

table name.

database

database name.

run(dataframe: DataFrame) DataFrame

Check the schema compatibility from a given Dataframe.

This method does not change anything on the Dataframe.

Parameters:

dataframe – dataframe to verify schema compatibility.

Returns:

unchanged dataframe.

Raises:

ValueError if the schemas are incompatible.