butterfree.validations package

Submodules

Validation implementing basic checks over the dataframe.

class butterfree.validations.basic_validaton.BasicValidation(dataframe: pyspark.sql.dataframe.DataFrame = None)

Bases: butterfree.validations.validation.Validation

Basic validation suite for Feature Set’s dataframe.

dataframe

object to be verified

check()

Check basic validation properties about the dataframe.

Raises

ValueError – if any of the verifications fail

validate_column_ts()

Check dataframe’s ts column.

Raises

ValueError – if dataframe don’t have a column named ts.

validate_df_is_empty()

Check dataframe emptiness.

Raises

ValueError – if dataframe is empty and is not streaming.

validate_df_is_spark_df()

Check type of dataframe object.

Raises

ValueError – if dataframe is not instance of pyspark.sql.DataFrame.

Abstract Validation class.

class butterfree.validations.validation.Validation(dataframe: pyspark.sql.dataframe.DataFrame = None)

Bases: abc.ABC

Validate dataframe properties.

dataframe

data to be verified.

abstract check() → None

Check validation properties about the dataframe.

Raises

ValueError – if any of the verifications fail.

input(dataframe: pyspark.sql.dataframe.DataFrame)

Input a dataframe to check.

Parameters

dataframe – data to check.

Module contents

Holds dataframe validate for multiple destinations.

class butterfree.validations.BasicValidation(dataframe: pyspark.sql.dataframe.DataFrame = None)

Bases: butterfree.validations.validation.Validation

Basic validation suite for Feature Set’s dataframe.

dataframe

object to be verified

check()

Check basic validation properties about the dataframe.

Raises

ValueError – if any of the verifications fail

validate_column_ts()

Check dataframe’s ts column.

Raises

ValueError – if dataframe don’t have a column named ts.

validate_df_is_empty()

Check dataframe emptiness.

Raises

ValueError – if dataframe is empty and is not streaming.

validate_df_is_spark_df()

Check type of dataframe object.

Raises

ValueError – if dataframe is not instance of pyspark.sql.DataFrame.