butterfree.transform.utils package

Submodules

Utils for date range generation.

butterfree.transform.utils.date_range.get_date_range(client: SparkClient, start_date: Union[str, datetime], end_date: Union[str, datetime], step: Optional[int] = None) DataFrame

Create a date range dataframe.

The dataframe returning from this method will containing a single column TIMESTAMP_COLUMN, of timestamp type, with dates between start and end.

Parameters:
  • client – a spark client.

  • start_date – range beginning value (inclusive).

  • end_date – range last value (exclusive)

  • step – optional step, in seconds.

Returns:

A single column date range spark dataframe.

Utils for custom or spark function to generation namedtuple.

class butterfree.transform.utils.function.Function(func: Callable, data_type: DataType)

Bases: object

Define a class Function.

Like a namedtuple:

Function = namedtuple(“Function”, [“function”, “data_type”]).

func

custom or spark functions, such as avg, std, count. For more information check spark functions:

For custom functions, look the path:

‘butterfree/transform/transformations/user_defined_functions’.

data_type

data type for the output columns.

property data_type: DataType

Function to be used in the transformation.

property func: Callable

Function to be used in the transformation.

Holds function for defining window in DataFrames.

class butterfree.transform.utils.window_spec.FrameBoundaries(mode: Optional[str], window_definition: str)

Bases: object

Utility functions for defining the frame boundaries.

Parameters:
  • mode – available modes to be used in time aggregations.

  • window_definition – time ranges to be used in the windows,

  • second (it can be) –

get(window: WindowSpec) Any

Returns window with or without the frame boundaries.

property window_size: int

Returns window size.

property window_unit: str

Returns window unit.

class butterfree.transform.utils.window_spec.Window(window_definition: str, partition_by: Optional[Union[Column, str, List[str]]] = None, order_by: Optional[Union[Column, str]] = None, mode: Optional[str] = None, slide: Optional[str] = None)

Bases: object

Utility functions for defining a window specification.

Parameters:
  • partition_by – he partitioning defined.

  • order_by – the ordering defined.

  • mode – available modes to be used in time aggregations.

  • window_definition – time ranges to be used in the windows, it can be second(s), minute(s), hour(s), day(s), week(s) and year(s),

Use the static methods in Window to create a WindowSpec.

DEFAULT_SLIDE_DURATION: str = '1 day'
get() Any

Defines a common window to be used both in time and rows windows.

get_name() str

Return window suffix name based on passed criteria.

Module contents

This module holds utils to be used by transformations.

class butterfree.transform.utils.Function(func: Callable, data_type: DataType)

Bases: object

Define a class Function.

Like a namedtuple:

Function = namedtuple(“Function”, [“function”, “data_type”]).

func

custom or spark functions, such as avg, std, count. For more information check spark functions:

For custom functions, look the path:

‘butterfree/transform/transformations/user_defined_functions’.

data_type

data type for the output columns.

property data_type: DataType

Function to be used in the transformation.

property func: Callable

Function to be used in the transformation.

class butterfree.transform.utils.Window(window_definition: str, partition_by: Optional[Union[Column, str, List[str]]] = None, order_by: Optional[Union[Column, str]] = None, mode: Optional[str] = None, slide: Optional[str] = None)

Bases: object

Utility functions for defining a window specification.

Parameters:
  • partition_by – he partitioning defined.

  • order_by – the ordering defined.

  • mode – available modes to be used in time aggregations.

  • window_definition – time ranges to be used in the windows, it can be second(s), minute(s), hour(s), day(s), week(s) and year(s),

Use the static methods in Window to create a WindowSpec.

DEFAULT_SLIDE_DURATION: str = '1 day'
get() Any

Defines a common window to be used both in time and rows windows.

get_name() str

Return window suffix name based on passed criteria.