butterfree.load package¶
Subpackages¶
- butterfree.load.processing package
- butterfree.load.writers package
- Submodules
HistoricalFeatureStoreWriter
HistoricalFeatureStoreWriter.db_config
HistoricalFeatureStoreWriter.database
HistoricalFeatureStoreWriter.num_partitions
HistoricalFeatureStoreWriter.validation_threshold
HistoricalFeatureStoreWriter.debug_mode
HistoricalFeatureStoreWriter.DEFAULT_VALIDATION_THRESHOLD
HistoricalFeatureStoreWriter.PARTITION_BY
HistoricalFeatureStoreWriter.check_schema()
HistoricalFeatureStoreWriter.validate()
HistoricalFeatureStoreWriter.write()
OnlineFeatureStoreWriter
OnlineFeatureStoreWriter.db_config
OnlineFeatureStoreWriter.debug_mode
OnlineFeatureStoreWriter.write_to_entity
OnlineFeatureStoreWriter.check_schema()
OnlineFeatureStoreWriter.filter_latest()
OnlineFeatureStoreWriter.get_db_schema()
OnlineFeatureStoreWriter.validate()
OnlineFeatureStoreWriter.write()
Writer
- Module contents
HistoricalFeatureStoreWriter
HistoricalFeatureStoreWriter.db_config
HistoricalFeatureStoreWriter.database
HistoricalFeatureStoreWriter.num_partitions
HistoricalFeatureStoreWriter.validation_threshold
HistoricalFeatureStoreWriter.debug_mode
HistoricalFeatureStoreWriter.DEFAULT_VALIDATION_THRESHOLD
HistoricalFeatureStoreWriter.PARTITION_BY
HistoricalFeatureStoreWriter.check_schema()
HistoricalFeatureStoreWriter.transformations
HistoricalFeatureStoreWriter.validate()
HistoricalFeatureStoreWriter.write()
OnlineFeatureStoreWriter
OnlineFeatureStoreWriter.db_config
OnlineFeatureStoreWriter.debug_mode
OnlineFeatureStoreWriter.write_to_entity
OnlineFeatureStoreWriter.check_schema()
OnlineFeatureStoreWriter.filter_latest()
OnlineFeatureStoreWriter.get_db_schema()
OnlineFeatureStoreWriter.validate()
OnlineFeatureStoreWriter.write()
- Submodules
Submodules¶
Holds the Sink class.
- class butterfree.load.sink.Sink(writers: List[Writer], validation: Optional[Validation] = None)¶
Bases:
HookableComponent
Define the destinations for the feature set pipeline.
A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.
- writers¶
list of Writers to use to load the data.
- validation¶
validation to check the data before starting to write.
- flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) List[StreamingQuery] ¶
Trigger a write job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Returns:
Streaming handlers for each defined writer, if writing streaming dfs.
- validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) None ¶
Trigger a validation job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Raises:
RuntimeError – if any on the Writers returns a failed validation.
- property validation: Optional[Validation]¶
Validation to check the data before starting to write.
Module contents¶
Holds the Sink component of a feature set pipeline.
- class butterfree.load.Sink(writers: List[Writer], validation: Optional[Validation] = None)¶
Bases:
HookableComponent
Define the destinations for the feature set pipeline.
A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.
- writers¶
list of Writers to use to load the data.
- validation¶
validation to check the data before starting to write.
- flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) List[StreamingQuery] ¶
Trigger a write job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Returns:
Streaming handlers for each defined writer, if writing streaming dfs.
- validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) None ¶
Trigger a validation job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Raises:
RuntimeError – if any on the Writers returns a failed validation.
- property validation: Optional[Validation]¶
Validation to check the data before starting to write.