butterfree.load package¶
Subpackages¶
- butterfree.load.processing package
- butterfree.load.writers package
- Submodules
HistoricalFeatureStoreWriterHistoricalFeatureStoreWriter.db_configHistoricalFeatureStoreWriter.databaseHistoricalFeatureStoreWriter.num_partitionsHistoricalFeatureStoreWriter.validation_thresholdHistoricalFeatureStoreWriter.debug_modeHistoricalFeatureStoreWriter.DEFAULT_VALIDATION_THRESHOLDHistoricalFeatureStoreWriter.PARTITION_BYHistoricalFeatureStoreWriter.check_schema()HistoricalFeatureStoreWriter.validate()HistoricalFeatureStoreWriter.write()
OnlineFeatureStoreWriterOnlineFeatureStoreWriter.db_configOnlineFeatureStoreWriter.debug_modeOnlineFeatureStoreWriter.write_to_entityOnlineFeatureStoreWriter.check_schema()OnlineFeatureStoreWriter.filter_latest()OnlineFeatureStoreWriter.get_db_schema()OnlineFeatureStoreWriter.validate()OnlineFeatureStoreWriter.write()
Writer
- Module contents
HistoricalFeatureStoreWriterHistoricalFeatureStoreWriter.db_configHistoricalFeatureStoreWriter.databaseHistoricalFeatureStoreWriter.num_partitionsHistoricalFeatureStoreWriter.validation_thresholdHistoricalFeatureStoreWriter.debug_modeHistoricalFeatureStoreWriter.DEFAULT_VALIDATION_THRESHOLDHistoricalFeatureStoreWriter.PARTITION_BYHistoricalFeatureStoreWriter.check_schema()HistoricalFeatureStoreWriter.transformationsHistoricalFeatureStoreWriter.validate()HistoricalFeatureStoreWriter.write()
OnlineFeatureStoreWriterOnlineFeatureStoreWriter.db_configOnlineFeatureStoreWriter.debug_modeOnlineFeatureStoreWriter.write_to_entityOnlineFeatureStoreWriter.check_schema()OnlineFeatureStoreWriter.filter_latest()OnlineFeatureStoreWriter.get_db_schema()OnlineFeatureStoreWriter.validate()OnlineFeatureStoreWriter.write()
- Submodules
Submodules¶
Holds the Sink class.
- class butterfree.load.sink.Sink(writers: List[Writer], validation: Optional[Validation] = None)¶
Bases:
HookableComponentDefine the destinations for the feature set pipeline.
A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.
- writers¶
list of Writers to use to load the data.
- validation¶
validation to check the data before starting to write.
- flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) List[StreamingQuery]¶
Trigger a write job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Returns:
Streaming handlers for each defined writer, if writing streaming dfs.
- validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) None¶
Trigger a validation job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Raises:
RuntimeError – if any on the Writers returns a failed validation.
- property validation: Optional[Validation]¶
Validation to check the data before starting to write.
Module contents¶
Holds the Sink component of a feature set pipeline.
- class butterfree.load.Sink(writers: List[Writer], validation: Optional[Validation] = None)¶
Bases:
HookableComponentDefine the destinations for the feature set pipeline.
A Sink is created from a set of writers. The main goal of the Sink is to trigger the load in each defined writers. After the load the entity can be used to make sure that all data was written properly using the validate method.
- writers¶
list of Writers to use to load the data.
- validation¶
validation to check the data before starting to write.
- flush(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) List[StreamingQuery]¶
Trigger a write job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Returns:
Streaming handlers for each defined writer, if writing streaming dfs.
- validate(feature_set: FeatureSet, dataframe: DataFrame, spark_client: SparkClient) None¶
Trigger a validation job in all the defined Writers.
- Parameters:
dataframe – spark dataframe containing data from a feature set.
feature_set – object processed with feature set metadata.
spark_client – client used to run a query.
- Raises:
RuntimeError – if any on the Writers returns a failed validation.
- property validation: Optional[Validation]¶
Validation to check the data before starting to write.