The Load step is the
Sink method, where we define the destinations for the feature set pipeline, that is, it is the process of recording the transformed data after the transformation step.
Declaring the sink:
sink = Sink( writers = [HistoricalFeatureStoreWriter(), OnlineFeatureStoreWriter()] ),
Currently, you can write your data into two types of
HistoricalFeatureStoreWriter: The Historical Feature Store will write the data to an AWS S3 bucket.
OnlineFeatureStoreWriter: The Online Feature Store will write the data to a Cassandra database.
If you declare your writers without a database configuration, they will use their default settings. But we can also define this configuration, such as:
config = S3Config(bucket="my_bucket", mode="append", format_="parquet") writers = [HistoricalFeatureStoreWriter(db_config=config)]
config = CassandraConfig( mode="overwrite", format_="org.apache.spark.sql.cassandra", keyspace="keyspace_name" ) writers = [OnlineFeatureStoreWriter(db_config=config)]
It’s also important to highlight that our writers support a
writers = [HistoricalFeatureStoreWriter(debug_mode=True), OnlineFeatureStoreWriter(debug_mode=True)] sink = Sink(writers=writers)
True, then a temporary view will be created, therefore no data will be actually saved to both historical and online feature store. Feel free to check our examples section, in order to learn more about how to use this mode.