butterfree.transform.features package¶
Submodules¶
Feature entity.
- class butterfree.transform.features.feature.Feature(name: str, description: str, dtype: Optional[DataType] = None, from_column: Optional[str] = None, transformation: Optional[TransformComponent] = None)¶
Bases:
object
Defines a Feature.
A Feature is the result of a transformation over one (or more) data columns over an input dataframe. Transformations can be as simple as renaming, casting types, mathematical expressions or complex functions/models.
- name¶
feature name. Can be use by the transformation to derive multiple output columns.
- description¶
brief explanation regarding the feature.
- dtype¶
data type for the output columns of this feature.
- from_column¶
original column to build feature. Used when there is transformation or the transformation has no reference about the column to use for.
- transformation¶
transformation that will be applied to create this feature.
- property dtype: Any¶
Attribute dtype getter.
- property from_column: Any¶
Attribute from_column getter.
- get_output_columns() List[str] ¶
Get output columns that will be generated by this feature engineering.
- Returns
Output columns names.
- transform(dataframe: DataFrame) DataFrame ¶
Performs a transformation to the feature pipeline.
- Parameters:
dataframe – input dataframe for the transformation.
- Returns:
Transformed dataframe.
- property transformation: Any¶
Attribute transformation getter.
KeyFeature entity.
- class butterfree.transform.features.key_feature.KeyFeature(name: str, description: str, dtype: DataType, from_column: Optional[str] = None, transformation: Optional[TransformComponent] = None)¶
Bases:
Feature
Defines a KeyFeature.
A FeatureSet must contain one or more KeyFeatures, which will be used as keys when storing the feature set dataframe as tables. The Feature Set may validate keys are unique for the latest state of a feature set.
- name¶
key name. Can be use by the transformation to derive multiple key columns.
- description¶
brief explanation regarding the key.
- dtype¶
data type for the output column of this key.
- from_column¶
original column to build a key. Used when there is transformation or the transformation has no reference about the column to use for.
- transformation¶
transformation that will be applied to create this key. Keys can be derived by transformations over any data column. Like a location hash based on latitude and longitude.
TimestampFeature entity.
- class butterfree.transform.features.timestamp_feature.TimestampFeature(from_column: Optional[str] = None, transformation: Optional[TransformComponent] = None, from_ms: bool = False, mask: Optional[str] = None)¶
Bases:
Feature
Defines a TimestampFeature.
A FeatureSet must contain one TimestampFeature, which will be used as a time tag for the state of all features. By containing a timestamp feature, users may time travel over their features. The Feature Set may validate that the set of keys and timestamp are unique for a feature set.
By defining a TimestampColumn, the feature set will always contain a data column called “timestamp” of TimestampType (spark dtype).
- from_column¶
original column to build a “timestamp” feature column. Used when there is transformation or the transformation has no reference about the column to use for. If from_column is None, the FeatureSet will assume the input dataframe already has a data column called “timestamp”.
- transformation¶
transformation that will be applied to create the “timestamp”. Type casting will already happen when no transformation is given. But a timestamp can be derived from multiple columns, like year, month and day, for example. The transformation must always handle naming and typing.
- from_ms¶
true if timestamp column presents milliseconds time unit. A
- conversion is then performed.
- mask¶
specified timestamp format by the user.
- transform(dataframe: DataFrame) DataFrame ¶
Performs a transformation to the feature pipeline.
- Parameters:
dataframe – input dataframe for the transformation.
- Returns:
Transformed dataframe.
Module contents¶
Holds all feature types to be part of a FeatureSet.
- class butterfree.transform.features.Feature(name: str, description: str, dtype: Optional[DataType] = None, from_column: Optional[str] = None, transformation: Optional[TransformComponent] = None)¶
Bases:
object
Defines a Feature.
A Feature is the result of a transformation over one (or more) data columns over an input dataframe. Transformations can be as simple as renaming, casting types, mathematical expressions or complex functions/models.
- name¶
feature name. Can be use by the transformation to derive multiple output columns.
- description¶
brief explanation regarding the feature.
- dtype¶
data type for the output columns of this feature.
- from_column¶
original column to build feature. Used when there is transformation or the transformation has no reference about the column to use for.
- transformation¶
transformation that will be applied to create this feature.
- property dtype: Any¶
Attribute dtype getter.
- property from_column: Any¶
Attribute from_column getter.
- get_output_columns() List[str] ¶
Get output columns that will be generated by this feature engineering.
- Returns
Output columns names.
- transform(dataframe: DataFrame) DataFrame ¶
Performs a transformation to the feature pipeline.
- Parameters:
dataframe – input dataframe for the transformation.
- Returns:
Transformed dataframe.
- property transformation: Any¶
Attribute transformation getter.
- class butterfree.transform.features.KeyFeature(name: str, description: str, dtype: DataType, from_column: Optional[str] = None, transformation: Optional[TransformComponent] = None)¶
Bases:
Feature
Defines a KeyFeature.
A FeatureSet must contain one or more KeyFeatures, which will be used as keys when storing the feature set dataframe as tables. The Feature Set may validate keys are unique for the latest state of a feature set.
- name¶
key name. Can be use by the transformation to derive multiple key columns.
- description¶
brief explanation regarding the key.
- dtype¶
data type for the output column of this key.
- from_column¶
original column to build a key. Used when there is transformation or the transformation has no reference about the column to use for.
- transformation¶
transformation that will be applied to create this key. Keys can be derived by transformations over any data column. Like a location hash based on latitude and longitude.
- class butterfree.transform.features.TimestampFeature(from_column: Optional[str] = None, transformation: Optional[TransformComponent] = None, from_ms: bool = False, mask: Optional[str] = None)¶
Bases:
Feature
Defines a TimestampFeature.
A FeatureSet must contain one TimestampFeature, which will be used as a time tag for the state of all features. By containing a timestamp feature, users may time travel over their features. The Feature Set may validate that the set of keys and timestamp are unique for a feature set.
By defining a TimestampColumn, the feature set will always contain a data column called “timestamp” of TimestampType (spark dtype).
- from_column¶
original column to build a “timestamp” feature column. Used when there is transformation or the transformation has no reference about the column to use for. If from_column is None, the FeatureSet will assume the input dataframe already has a data column called “timestamp”.
- transformation¶
transformation that will be applied to create the “timestamp”. Type casting will already happen when no transformation is given. But a timestamp can be derived from multiple columns, like year, month and day, for example. The transformation must always handle naming and typing.
- from_ms¶
true if timestamp column presents milliseconds time unit. A
- conversion is then performed.
- mask¶
specified timestamp format by the user.
- transform(dataframe: DataFrame) DataFrame ¶
Performs a transformation to the feature pipeline.
- Parameters:
dataframe – input dataframe for the transformation.
- Returns:
Transformed dataframe.