butterfree.load package¶

Subpackages¶

butterfree.load.processing package
- Submodules
  - json_transform()
- Module contents
  - json_transform()
butterfree.load.writers package
- Submodules
- Module contents
  - HistoricalFeatureStoreWriter
  - OnlineFeatureStoreWriter

Submodules¶

Holds the Sink class.

class butterfree.load.sink.Sink(writers: List[Writer], validation: Optional[Validation] = None)¶

Bases: HookableComponent

Define the destinations for the feature set pipeline.

A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.

writers¶: list of Writers to use to load the data.

validation¶: validation to check the data before starting to write.

flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) → List[StreamingQuery]¶

Trigger a write job in all the defined Writers.

Parameters:

dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.

Returns:

Streaming handlers for each defined writer, if writing streaming dfs.

validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) → None¶

Trigger a validation job in all the defined Writers.

Parameters:

dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.

Raises:

RuntimeError – if any on the Writers returns a failed validation.

property validation: Optional[Validation]¶: Validation to check the data before starting to write.

property writers: List[Writer]¶: List of Writers to use to load the data.

Module contents¶

Holds the Sink component of a feature set pipeline.

class butterfree.load.Sink(writers: List[Writer], validation: Optional[Validation] = None)¶

Bases: HookableComponent

Define the destinations for the feature set pipeline.

A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.

writers¶: list of Writers to use to load the data.

validation¶: validation to check the data before starting to write.

flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) → List[StreamingQuery]¶

Trigger a write job in all the defined Writers.

Parameters:

dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.

Returns:

Streaming handlers for each defined writer, if writing streaming dfs.

validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) → None¶

Trigger a validation job in all the defined Writers.

Parameters:

dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.

Raises:

RuntimeError – if any on the Writers returns a failed validation.

property validation: Optional[Validation]¶: Validation to check the data before starting to write.

property writers: List[Writer]¶: List of Writers to use to load the data.