butterfree.validations package

Submodules

Validation implementing basic checks over the dataframe.

class butterfree.validations.basic_validaton.BasicValidation(dataframe: Optional[DataFrame] = None)

Bases: Validation

Basic validation suite for Feature Set’s dataframe.

dataframe

object to be verified

check() None

Check basic validation properties about the dataframe.

Raises:

ValueError – if any of the verifications fail

validate_column_ts() None

Check dataframe’s ts column.

Raises:

ValueError – if dataframe don’t have a column named ts.

validate_df_is_empty() None

Check dataframe emptiness.

Raises:

ValueError – if dataframe is empty and is not streaming.

validate_df_is_spark_df() None

Check type of dataframe object.

Raises:

ValueError – if dataframe is not instance of pyspark.sql.DataFrame.

Abstract Validation class.

class butterfree.validations.validation.Validation(dataframe: Optional[DataFrame] = None)

Bases: ABC

Validate dataframe properties.

dataframe

data to be verified.

abstract check() None

Check validation properties about the dataframe.

Raises:

ValueError – if any of the verifications fail.

input(dataframe: DataFrame) Validation

Input a dataframe to check.

Parameters:

dataframe – data to check.

Module contents

Holds dataframe validate for multiple destinations.

class butterfree.validations.BasicValidation(dataframe: Optional[DataFrame] = None)

Bases: Validation

Basic validation suite for Feature Set’s dataframe.

dataframe

object to be verified

check() None

Check basic validation properties about the dataframe.

Raises:

ValueError – if any of the verifications fail

validate_column_ts() None

Check dataframe’s ts column.

Raises:

ValueError – if dataframe don’t have a column named ts.

validate_df_is_empty() None

Check dataframe emptiness.

Raises:

ValueError – if dataframe is empty and is not streaming.

validate_df_is_spark_df() None

Check type of dataframe object.

Raises:

ValueError – if dataframe is not instance of pyspark.sql.DataFrame.