butterfree.load package

Subpackages

Submodules

Holds the Sink class.

class butterfree.load.sink.Sink(writers: List[Writer], validation: Optional[Validation] = None)

Bases: HookableComponent

Define the destinations for the feature set pipeline.

A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.

writers

list of Writers to use to load the data.

validation

validation to check the data before starting to write.

flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) List[StreamingQuery]

Trigger a write job in all the defined Writers.

Parameters:
  • dataframe – spark dataframe containing data from a feature set.

  • feature_set – object processed with feature set metadata.

  • spark_client – client used to run a query.

Returns:

Streaming handlers for each defined writer, if writing streaming dfs.

validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) None

Trigger a validation job in all the defined Writers.

Parameters:
  • dataframe – spark dataframe containing data from a feature set.

  • feature_set – object processed with feature set metadata.

  • spark_client – client used to run a query.

Raises:

RuntimeError – if any on the Writers returns a failed validation.

property validation: Optional[Validation]

Validation to check the data before starting to write.

property writers: List[Writer]

List of Writers to use to load the data.

Module contents

Holds the Sink component of a feature set pipeline.

class butterfree.load.Sink(writers: List[Writer], validation: Optional[Validation] = None)

Bases: HookableComponent

Define the destinations for the feature set pipeline.

A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.

writers

list of Writers to use to load the data.

validation

validation to check the data before starting to write.

flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) List[StreamingQuery]

Trigger a write job in all the defined Writers.

Parameters:
  • dataframe – spark dataframe containing data from a feature set.

  • feature_set – object processed with feature set metadata.

  • spark_client – client used to run a query.

Returns:

Streaming handlers for each defined writer, if writing streaming dfs.

validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) None

Trigger a validation job in all the defined Writers.

Parameters:
  • dataframe – spark dataframe containing data from a feature set.

  • feature_set – object processed with feature set metadata.

  • spark_client – client used to run a query.

Raises:

RuntimeError – if any on the Writers returns a failed validation.

property validation: Optional[Validation]

Validation to check the data before starting to write.

property writers: List[Writer]

List of Writers to use to load the data.