butterfree.transform.utils package¶
Submodules¶
Utils for date range generation.
- butterfree.transform.utils.date_range.get_date_range(client: SparkClient, start_date: Union[str, datetime], end_date: Union[str, datetime], step: Optional[int] = None) DataFrame ¶
Create a date range dataframe.
The dataframe returning from this method will containing a single column TIMESTAMP_COLUMN, of timestamp type, with dates between start and end.
- Parameters:
client – a spark client.
start_date – range beginning value (inclusive).
end_date – range last value (exclusive)
step – optional step, in seconds.
- Returns:
A single column date range spark dataframe.
Utils for custom or spark function to generation namedtuple.
- class butterfree.transform.utils.function.Function(func: Callable, data_type: DataType)¶
Bases:
object
Define a class Function.
- Like a namedtuple:
Function = namedtuple(“Function”, [“function”, “data_type”]).
- func¶
custom or spark functions, such as avg, std, count. For more information check spark functions:
- For custom functions, look the path:
‘butterfree/transform/transformations/user_defined_functions’.
- data_type¶
data type for the output columns.
- property func: Callable¶
Function to be used in the transformation.
Holds function for defining window in DataFrames.
- class butterfree.transform.utils.window_spec.FrameBoundaries(mode: Optional[str], window_definition: str)¶
Bases:
object
Utility functions for defining the frame boundaries.
- Parameters:
mode – available modes to be used in time aggregations.
window_definition – time ranges to be used in the windows,
second (it can be) –
- get(window: WindowSpec) Any ¶
Returns window with or without the frame boundaries.
- property window_size: int¶
Returns window size.
- property window_unit: str¶
Returns window unit.
- class butterfree.transform.utils.window_spec.Window(window_definition: str, partition_by: Optional[Union[Column, str, List[str]]] = None, order_by: Optional[Union[Column, str]] = None, mode: Optional[str] = None, slide: Optional[str] = None)¶
Bases:
object
Utility functions for defining a window specification.
- Parameters:
partition_by – he partitioning defined.
order_by – the ordering defined.
mode – available modes to be used in time aggregations.
window_definition – time ranges to be used in the windows, it can be second(s), minute(s), hour(s), day(s), week(s) and year(s),
Use the static methods in
Window
to create aWindowSpec
.- DEFAULT_SLIDE_DURATION: str = '1 day'¶
- get() Any ¶
Defines a common window to be used both in time and rows windows.
- get_name() str ¶
Return window suffix name based on passed criteria.
Module contents¶
This module holds utils to be used by transformations.
- class butterfree.transform.utils.Function(func: Callable, data_type: DataType)¶
Bases:
object
Define a class Function.
- Like a namedtuple:
Function = namedtuple(“Function”, [“function”, “data_type”]).
- func¶
custom or spark functions, such as avg, std, count. For more information check spark functions:
- For custom functions, look the path:
‘butterfree/transform/transformations/user_defined_functions’.
- data_type¶
data type for the output columns.
- property func: Callable¶
Function to be used in the transformation.
- class butterfree.transform.utils.Window(window_definition: str, partition_by: Optional[Union[Column, str, List[str]]] = None, order_by: Optional[Union[Column, str]] = None, mode: Optional[str] = None, slide: Optional[str] = None)¶
Bases:
object
Utility functions for defining a window specification.
- Parameters:
partition_by – he partitioning defined.
order_by – the ordering defined.
mode – available modes to be used in time aggregations.
window_definition – time ranges to be used in the windows, it can be second(s), minute(s), hour(s), day(s), week(s) and year(s),
Use the static methods in
Window
to create aWindowSpec
.- DEFAULT_SLIDE_DURATION: str = '1 day'¶
- get() Any ¶
Defines a common window to be used both in time and rows windows.
- get_name() str ¶
Return window suffix name based on passed criteria.