butterfree.configs.db package¶
Submodules¶
Abstract classes for database configurations with spark.
- class butterfree.configs.db.abstract_config.AbstractWriteConfig¶
Bases:
ABC
Abstract class for database write configurations with spark.
- abstract property database: str¶
Database name.
- abstract property format_: Any¶
Config option “format” for spark write.
Args:
- Returns:
format.
- Return type:
str
- abstract property mode: Any¶
Config option “mode” for spark write.
Args:
- Returns:
mode.
- Return type:
str
- abstract translate(schema: Any) List[Dict[Any, Any]] ¶
Translate feature set spark schema to the corresponding database.
- Parameters:
schema – feature set schema
- Returns:
Corresponding database schema.
Holds configurations to read and write with Spark to Cassandra DB.
- class butterfree.configs.db.cassandra_config.CassandraConfig(username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, keyspace: Optional[str] = None, mode: Optional[str] = None, format_: Optional[str] = None, stream_processing_time: Optional[str] = None, stream_output_mode: Optional[str] = None, stream_checkpoint_path: Optional[str] = None, read_consistency_level: Optional[str] = None, write_consistency_level: Optional[str] = None, local_dc: Optional[str] = None)¶
Bases:
AbstractWriteConfig
Configuration for Spark to connect on Cassandra DB.
References can be found [here](https://docs.databricks.com/data/data-sources/cassandra.html).
- username¶
username to use in connection.
- password¶
password to use in connection.
- host¶
host to use in connection.
- keyspace¶
Cassandra DB keyspace to write data.
- mode¶
write mode for Spark.
- format_¶
write format for Spark.
- stream_processing_time¶
processing time interval for streaming jobs.
- stream_output_mode¶
specify the mode from writing streaming data.
- stream_checkpoint_path¶
path on S3 to save checkpoints for the stream job.
- read_consistency_level¶
read consistency level used in connection.
- write_consistency_level¶
write consistency level used in connection.
More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)
- property database: str¶
Database name.
- property format_: Optional[str]¶
Write format for Spark.
- get_options(table: str) Dict[Optional[str], Optional[str]] ¶
Get options for connect to Cassandra DB.
Options will be a dictionary with the write and read configuration for spark to cassandra.
- Parameters:
table – table name (keyspace) into Cassandra DB.
- Returns:
Configuration to connect to Cassandra DB.
- property host: Optional[str]¶
Host used in connection to Cassandra DB.
- property keyspace: Optional[str]¶
Cassandra DB keyspace to write data.
- property local_dc: Optional[str]¶
Local DC for Cassandra connection.
- property mode: Optional[str]¶
Write mode for Spark.
- property password: Optional[str]¶
Password used in connection to Cassandra DB.
- property read_consistency_level: Optional[str]¶
Read consistency level for Cassandra.
- property stream_checkpoint_path: Optional[str]¶
Path on S3 to save checkpoints for the stream job.
- property stream_output_mode: Optional[str]¶
Specify the mode from writing streaming data.
- property stream_processing_time: Optional[str]¶
Processing time interval for streaming jobs.
- translate(schema: List[Dict[str, Any]]) List[Dict[str, Any]] ¶
Get feature set schema to be translated.
The output will be a list of dictionaries regarding cassandra database schema.
- Parameters:
schema – feature set schema in spark.
- Returns:
Cassandra schema.
- property username: Optional[str]¶
Username used in connection to Cassandra DB.
- property write_consistency_level: Optional[str]¶
Write consistency level for Cassandra.
Holds configurations to read and write with Spark to Kafka.
- class butterfree.configs.db.kafka_config.KafkaConfig(kafka_topic: Optional[str] = None, kafka_connection_string: Optional[str] = None, mode: Optional[str] = None, format_: Optional[str] = None, stream_processing_time: Optional[str] = None, stream_output_mode: Optional[str] = None, stream_checkpoint_path: Optional[str] = None)¶
Bases:
AbstractWriteConfig
Configuration for Spark to connect to Kafka.
- kafka_topic¶
string with kafka topic name.
- kafka_connection_string¶
string with hosts and ports to connect.
- mode¶
write mode for Spark.
- format_¶
write format for Spark.
- stream_processing_time¶
processing time interval for streaming jobs.
- stream_output_mode¶
specify the mode from writing streaming data.
- stream_checkpoint_path¶
path on S3 to save checkpoints for the stream job.
More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)
- property database: str¶
Database name.
- property format_: Optional[str]¶
Write format for Spark.
- get_options(topic: str) Dict[Optional[str], Optional[str]] ¶
Get options for connecting to Kafka.
Options will be a dictionary with the write and read configuration for spark to kafka.
- Parameters:
topic – topic related to Kafka.
- Returns:
Configuration to connect to Kafka.
- property kafka_connection_string: Optional[str]¶
Kafka connection string with hosts and ports to connect.
- property kafka_topic: Optional[str]¶
Kafka topic name.
- property mode: Optional[str]¶
Write mode for Spark.
- property stream_checkpoint_path: Optional[str]¶
Path on S3 to save checkpoints for the stream job.
- property stream_output_mode: Optional[str]¶
Specify the mode from writing streaming data.
- property stream_processing_time: Optional[str]¶
Processing time interval for streaming jobs.
- translate(schema: List[Dict[str, Any]]) List[Dict[str, Any]] ¶
Get feature set schema to be translated.
The output will be a list of dictionaries regarding cassandra database schema.
- Parameters:
schema – feature set schema in spark.
- Returns:
Kafka schema.
Holds configurations to read and write with Spark to AWS S3.
- class butterfree.configs.db.metastore_config.MetastoreConfig(path: Optional[str] = None, mode: Optional[str] = None, format_: Optional[str] = None, file_system: Optional[str] = None)¶
Bases:
AbstractWriteConfig
Configuration for Spark metastore database stored.
By default the configuration is for AWS S3.
- path¶
database root location.
- mode¶
writing mode used be writers.
- format_¶
expected stored file format.
- file_system¶
file schema uri, like: s3a, file.
- property database: str¶
Database name.
- property file_system: Optional[str]¶
Writing mode used be writers.
- property format_: Optional[str]¶
Expected stored file format.
- get_options(key: str) Dict[Optional[str], Optional[str]] ¶
Get options for Metastore.
Options will be a dictionary with the write and read configuration for Spark Metastore.
- Parameters:
key – path to save data into Metastore.
- Returns:
Options configuration for Metastore.
- get_path_with_partitions(key: str, dataframe: DataFrame) List ¶
Get options for AWS S3 from partitioned parquet file.
Options will be a dictionary with the write and read configuration for Spark to AWS S3.
- Parameters:
key – path to save data into AWS S3 bucket.
dataframe – spark dataframe containing data from a feature set.
- Returns:
A list of string for file-system backed data sources.
- property mode: Optional[str]¶
Writing mode used be writers.
- property path: Optional[str]¶
Bucket name.
- translate(schema: List[Dict[str, Any]]) List[Dict[str, Any]] ¶
Translate feature set spark schema to the corresponding database.
Module contents¶
This module holds database configurations to be used by clients.
- class butterfree.configs.db.AbstractWriteConfig¶
Bases:
ABC
Abstract class for database write configurations with spark.
- abstract property database: str¶
Database name.
- abstract property format_: Any¶
Config option “format” for spark write.
Args:
- Returns:
format.
- Return type:
str
- abstract property mode: Any¶
Config option “mode” for spark write.
Args:
- Returns:
mode.
- Return type:
str
- abstract translate(schema: Any) List[Dict[Any, Any]] ¶
Translate feature set spark schema to the corresponding database.
- Parameters:
schema – feature set schema
- Returns:
Corresponding database schema.
- class butterfree.configs.db.CassandraConfig(username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, keyspace: Optional[str] = None, mode: Optional[str] = None, format_: Optional[str] = None, stream_processing_time: Optional[str] = None, stream_output_mode: Optional[str] = None, stream_checkpoint_path: Optional[str] = None, read_consistency_level: Optional[str] = None, write_consistency_level: Optional[str] = None, local_dc: Optional[str] = None)¶
Bases:
AbstractWriteConfig
Configuration for Spark to connect on Cassandra DB.
References can be found [here](https://docs.databricks.com/data/data-sources/cassandra.html).
- username¶
username to use in connection.
- password¶
password to use in connection.
- host¶
host to use in connection.
- keyspace¶
Cassandra DB keyspace to write data.
- mode¶
write mode for Spark.
- format_¶
write format for Spark.
- stream_processing_time¶
processing time interval for streaming jobs.
- stream_output_mode¶
specify the mode from writing streaming data.
- stream_checkpoint_path¶
path on S3 to save checkpoints for the stream job.
- read_consistency_level¶
read consistency level used in connection.
- write_consistency_level¶
write consistency level used in connection.
More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)
- property database: str¶
Database name.
- property format_: Optional[str]¶
Write format for Spark.
- get_options(table: str) Dict[Optional[str], Optional[str]] ¶
Get options for connect to Cassandra DB.
Options will be a dictionary with the write and read configuration for spark to cassandra.
- Parameters:
table – table name (keyspace) into Cassandra DB.
- Returns:
Configuration to connect to Cassandra DB.
- property host: Optional[str]¶
Host used in connection to Cassandra DB.
- property keyspace: Optional[str]¶
Cassandra DB keyspace to write data.
- property local_dc: Optional[str]¶
Local DC for Cassandra connection.
- property mode: Optional[str]¶
Write mode for Spark.
- property password: Optional[str]¶
Password used in connection to Cassandra DB.
- property read_consistency_level: Optional[str]¶
Read consistency level for Cassandra.
- property stream_checkpoint_path: Optional[str]¶
Path on S3 to save checkpoints for the stream job.
- property stream_output_mode: Optional[str]¶
Specify the mode from writing streaming data.
- property stream_processing_time: Optional[str]¶
Processing time interval for streaming jobs.
- translate(schema: List[Dict[str, Any]]) List[Dict[str, Any]] ¶
Get feature set schema to be translated.
The output will be a list of dictionaries regarding cassandra database schema.
- Parameters:
schema – feature set schema in spark.
- Returns:
Cassandra schema.
- property username: Optional[str]¶
Username used in connection to Cassandra DB.
- property write_consistency_level: Optional[str]¶
Write consistency level for Cassandra.
- class butterfree.configs.db.KafkaConfig(kafka_topic: Optional[str] = None, kafka_connection_string: Optional[str] = None, mode: Optional[str] = None, format_: Optional[str] = None, stream_processing_time: Optional[str] = None, stream_output_mode: Optional[str] = None, stream_checkpoint_path: Optional[str] = None)¶
Bases:
AbstractWriteConfig
Configuration for Spark to connect to Kafka.
- kafka_topic¶
string with kafka topic name.
- kafka_connection_string¶
string with hosts and ports to connect.
- mode¶
write mode for Spark.
- format_¶
write format for Spark.
- stream_processing_time¶
processing time interval for streaming jobs.
- stream_output_mode¶
specify the mode from writing streaming data.
- stream_checkpoint_path¶
path on S3 to save checkpoints for the stream job.
More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)
- property database: str¶
Database name.
- property format_: Optional[str]¶
Write format for Spark.
- get_options(topic: str) Dict[Optional[str], Optional[str]] ¶
Get options for connecting to Kafka.
Options will be a dictionary with the write and read configuration for spark to kafka.
- Parameters:
topic – topic related to Kafka.
- Returns:
Configuration to connect to Kafka.
- property kafka_connection_string: Optional[str]¶
Kafka connection string with hosts and ports to connect.
- property kafka_topic: Optional[str]¶
Kafka topic name.
- property mode: Optional[str]¶
Write mode for Spark.
- property stream_checkpoint_path: Optional[str]¶
Path on S3 to save checkpoints for the stream job.
- property stream_output_mode: Optional[str]¶
Specify the mode from writing streaming data.
- property stream_processing_time: Optional[str]¶
Processing time interval for streaming jobs.
- translate(schema: List[Dict[str, Any]]) List[Dict[str, Any]] ¶
Get feature set schema to be translated.
The output will be a list of dictionaries regarding cassandra database schema.
- Parameters:
schema – feature set schema in spark.
- Returns:
Kafka schema.
- class butterfree.configs.db.MetastoreConfig(path: Optional[str] = None, mode: Optional[str] = None, format_: Optional[str] = None, file_system: Optional[str] = None)¶
Bases:
AbstractWriteConfig
Configuration for Spark metastore database stored.
By default the configuration is for AWS S3.
- path¶
database root location.
- mode¶
writing mode used be writers.
- format_¶
expected stored file format.
- file_system¶
file schema uri, like: s3a, file.
- property database: str¶
Database name.
- property file_system: Optional[str]¶
Writing mode used be writers.
- property format_: Optional[str]¶
Expected stored file format.
- get_options(key: str) Dict[Optional[str], Optional[str]] ¶
Get options for Metastore.
Options will be a dictionary with the write and read configuration for Spark Metastore.
- Parameters:
key – path to save data into Metastore.
- Returns:
Options configuration for Metastore.
- get_path_with_partitions(key: str, dataframe: DataFrame) List ¶
Get options for AWS S3 from partitioned parquet file.
Options will be a dictionary with the write and read configuration for Spark to AWS S3.
- Parameters:
key – path to save data into AWS S3 bucket.
dataframe – spark dataframe containing data from a feature set.
- Returns:
A list of string for file-system backed data sources.
- property mode: Optional[str]¶
Writing mode used be writers.
- property path: Optional[str]¶
Bucket name.
- translate(schema: List[Dict[str, Any]]) List[Dict[str, Any]] ¶
Translate feature set spark schema to the corresponding database.