butterfree.configs.db package

Submodules

Abstract classes for database configurations with spark.

class butterfree.configs.db.abstract_config.AbstractWriteConfig

Bases: abc.ABC

Abstract class for database write configurations with spark.

abstract property format_

Config option “format” for spark write.

Args:

Returns

format.

Return type

str

abstract get_options(*args, **kwargs) → dict

Get connection options configuration defined in the entity.

Parameters
  • *args – args to use in the options

  • **kwargs – kwargs to use in the options

Returns

Connection options configuration.

abstract property mode

Config option “mode” for spark write.

Args:

Returns

mode.

Return type

str

abstract translate(schema) → List[Dict]

Translate feature set spark schema to the corresponding database.

Parameters

schema – feature set schema

Returns

Corresponding database schema.

Holds configurations to read and write with Spark to Cassandra DB.

class butterfree.configs.db.cassandra_config.CassandraConfig(username: str = None, password: str = None, host: str = None, keyspace: str = None, mode: str = None, format_: str = None, stream_processing_time: str = None, stream_output_mode: str = None, stream_checkpoint_path: str = None)

Bases: butterfree.configs.db.abstract_config.AbstractWriteConfig

Configuration for Spark to connect on Cassandra DB.

References can be found [here](https://docs.databricks.com/data/data-sources/cassandra.html).

username

username to use in connection.

password

password to use in connection.

host

host to use in connection.

keyspace

Cassandra DB keyspace to write data.

mode

write mode for Spark.

format_

write format for Spark.

stream_processing_time

processing time interval for streaming jobs.

stream_output_mode

specify the mode from writing streaming data.

stream_checkpoint_path

path on S3 to save checkpoints for the stream job.

More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)

property format_

Write format for Spark.

get_options(table: str) → dict

Get options for connect to Cassandra DB.

Options will be a dictionary with the write and read configuration for spark to cassandra.

Parameters

table – table name (keyspace) into Cassandra DB.

Returns

Configuration to connect to Cassandra DB.

property host

Host used in connection to Cassandra DB.

property keyspace

Cassandra DB keyspace to write data.

property mode

Write mode for Spark.

property password

Password used in connection to Cassandra DB.

property stream_checkpoint_path

Path on S3 to save checkpoints for the stream job.

property stream_output_mode

Specify the mode from writing streaming data.

property stream_processing_time

Processing time interval for streaming jobs.

translate(schema) → List[Dict]

Get feature set schema to be translated.

The output will be a list of dictionaries regarding cassandra database schema.

Parameters

schema – feature set schema in spark.

Returns

Cassandra schema.

property username

Username used in connection to Cassandra DB.

Holds configurations to read and write with Spark to Kafka.

class butterfree.configs.db.kafka_config.KafkaConfig(kafka_connection_string: str = None, mode: str = None, format_: str = None, stream_processing_time: str = None, stream_output_mode: str = None, stream_checkpoint_path: str = None)

Bases: butterfree.configs.db.abstract_config.AbstractWriteConfig

Configuration for Spark to connect to Kafka.

kafka_connection_string

string with hosts and ports to connect.

mode

write mode for Spark.

format_

write format for Spark.

stream_processing_time

processing time interval for streaming jobs.

stream_output_mode

specify the mode from writing streaming data.

stream_checkpoint_path

path on S3 to save checkpoints for the stream job.

More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)

property format_

Write format for Spark.

get_options(topic: str) → dict

Get options for connecting to Kafka.

Options will be a dictionary with the write and read configuration for spark to kafka.

Parameters

topic – topic related to Kafka.

Returns

Configuration to connect to Kafka.

property kafka_connection_string

Username used in connection to Cassandra DB.

property mode

Write mode for Spark.

property stream_checkpoint_path

Path on S3 to save checkpoints for the stream job.

property stream_output_mode

Specify the mode from writing streaming data.

property stream_processing_time

Processing time interval for streaming jobs.

translate(schema) → List[Dict]

Get feature set schema to be translated.

The output will be a list of dictionaries regarding cassandra database schema.

Parameters

schema – feature set schema in spark.

Returns

Kafka schema.

Holds configurations to read and write with Spark to AWS S3.

class butterfree.configs.db.s3_config.S3Config(bucket: str = None, mode: str = None, format_: str = None)

Bases: butterfree.configs.db.abstract_config.AbstractWriteConfig

Configuration for Spark metastore database stored on AWS S3.

database

database name.

mode

writing mode used be writers.

format_

expected stored file format.

path

database root location.

partition_by

partition column to use when writing.

property bucket

Bucket name.

property format_

Expected stored file format.

get_options(key: str) → dict

Get options for AWS S3.

Options will be a dictionary with the write and read configuration for Spark to AWS S3.

Parameters

key – path to save data into AWS S3 bucket.

Returns

Options configuration for AWS S3.

property mode

Writing mode used be writers.

translate(schema) → List[Dict]

Translate feature set spark schema to the corresponding database.

Module contents

This module holds database configurations to be used by clients.

class butterfree.configs.db.AbstractWriteConfig

Bases: abc.ABC

Abstract class for database write configurations with spark.

abstract property format_

Config option “format” for spark write.

Args:

Returns

format.

Return type

str

abstract get_options(*args, **kwargs) → dict

Get connection options configuration defined in the entity.

Parameters
  • *args – args to use in the options

  • **kwargs – kwargs to use in the options

Returns

Connection options configuration.

abstract property mode

Config option “mode” for spark write.

Args:

Returns

mode.

Return type

str

abstract translate(schema) → List[Dict]

Translate feature set spark schema to the corresponding database.

Parameters

schema – feature set schema

Returns

Corresponding database schema.

class butterfree.configs.db.CassandraConfig(username: str = None, password: str = None, host: str = None, keyspace: str = None, mode: str = None, format_: str = None, stream_processing_time: str = None, stream_output_mode: str = None, stream_checkpoint_path: str = None)

Bases: butterfree.configs.db.abstract_config.AbstractWriteConfig

Configuration for Spark to connect on Cassandra DB.

References can be found [here](https://docs.databricks.com/data/data-sources/cassandra.html).

username

username to use in connection.

password

password to use in connection.

host

host to use in connection.

keyspace

Cassandra DB keyspace to write data.

mode

write mode for Spark.

format_

write format for Spark.

stream_processing_time

processing time interval for streaming jobs.

stream_output_mode

specify the mode from writing streaming data.

stream_checkpoint_path

path on S3 to save checkpoints for the stream job.

More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)

property format_

Write format for Spark.

get_options(table: str) → dict

Get options for connect to Cassandra DB.

Options will be a dictionary with the write and read configuration for spark to cassandra.

Parameters

table – table name (keyspace) into Cassandra DB.

Returns

Configuration to connect to Cassandra DB.

property host

Host used in connection to Cassandra DB.

property keyspace

Cassandra DB keyspace to write data.

property mode

Write mode for Spark.

property password

Password used in connection to Cassandra DB.

property stream_checkpoint_path

Path on S3 to save checkpoints for the stream job.

property stream_output_mode

Specify the mode from writing streaming data.

property stream_processing_time

Processing time interval for streaming jobs.

translate(schema) → List[Dict]

Get feature set schema to be translated.

The output will be a list of dictionaries regarding cassandra database schema.

Parameters

schema – feature set schema in spark.

Returns

Cassandra schema.

property username

Username used in connection to Cassandra DB.

class butterfree.configs.db.KafkaConfig(kafka_connection_string: str = None, mode: str = None, format_: str = None, stream_processing_time: str = None, stream_output_mode: str = None, stream_checkpoint_path: str = None)

Bases: butterfree.configs.db.abstract_config.AbstractWriteConfig

Configuration for Spark to connect to Kafka.

kafka_connection_string

string with hosts and ports to connect.

mode

write mode for Spark.

format_

write format for Spark.

stream_processing_time

processing time interval for streaming jobs.

stream_output_mode

specify the mode from writing streaming data.

stream_checkpoint_path

path on S3 to save checkpoints for the stream job.

More information about processing_time, output_mode and checkpoint_path can be found in Spark documentation: [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)

property format_

Write format for Spark.

get_options(topic: str) → dict

Get options for connecting to Kafka.

Options will be a dictionary with the write and read configuration for spark to kafka.

Parameters

topic – topic related to Kafka.

Returns

Configuration to connect to Kafka.

property kafka_connection_string

Username used in connection to Cassandra DB.

property mode

Write mode for Spark.

property stream_checkpoint_path

Path on S3 to save checkpoints for the stream job.

property stream_output_mode

Specify the mode from writing streaming data.

property stream_processing_time

Processing time interval for streaming jobs.

translate(schema) → List[Dict]

Get feature set schema to be translated.

The output will be a list of dictionaries regarding cassandra database schema.

Parameters

schema – feature set schema in spark.

Returns

Kafka schema.

class butterfree.configs.db.S3Config(bucket: str = None, mode: str = None, format_: str = None)

Bases: butterfree.configs.db.abstract_config.AbstractWriteConfig

Configuration for Spark metastore database stored on AWS S3.

database

database name.

mode

writing mode used be writers.

format_

expected stored file format.

path

database root location.

partition_by

partition column to use when writing.

property bucket

Bucket name.

property format_

Expected stored file format.

get_options(key: str) → dict

Get options for AWS S3.

Options will be a dictionary with the write and read configuration for Spark to AWS S3.

Parameters

key – path to save data into AWS S3 bucket.

Returns

Options configuration for AWS S3.

property mode

Writing mode used be writers.

translate(schema) → List[Dict]

Translate feature set spark schema to the corresponding database.