butterfree.transform.utils package

Submodules

Utils for date range generation.

butterfree.transform.utils.date_range.get_date_range(client: butterfree.clients.spark_client.SparkClient, start_date: Union[str, datetime.datetime], end_date: Union[str, datetime.datetime], step: int = None)

Create a date range dataframe.

The dataframe returning from this method will containing a single column TIMESTAMP_COLUMN, of timestamp type, with dates between start and end.

Parameters
  • client – a spark client.

  • start_date – range beginning value (inclusive).

  • end_date – range last value (exclusive)

  • step – optional step, in seconds.

Returns

A single column date range spark dataframe.

Utils for custom or spark function to generation namedtuple.

class butterfree.transform.utils.function.Function(func: parameters_validation.parameter_validation_decorator.parameter_validation.<locals>.func_partial.<locals>.validation_partial, data_type: parameters_validation.parameter_validation_decorator.parameter_validation.<locals>.func_partial.<locals>.validation_partial)

Bases: object

Define a class Function.

Like a namedtuple:

Function = namedtuple(“Function”, [“function”, “data_type”]).

func

custom or spark functions, such as avg, std, count. For more information check spark functions:

For custom functions, look the path:

‘butterfree/transform/transformations/user_defined_functions’.

data_type

data type for the output columns.

property data_type

Function to be used in the transformation.

property func

Function to be used in the transformation.

Holds function for defining window in DataFrames.

class butterfree.transform.utils.window_spec.FrameBoundaries(mode=None, window_definition=None)

Bases: object

Utility functions for defining the frame boundaries.

Parameters
  • mode – available modes to be used in time aggregations.

  • window_definition – time ranges to be used in the windows,

  • can be second (it) –

get(w)

Returns window with or without the frame boundaries.

property window_size

Returns window size.

property window_unit

Returns window unit.

class butterfree.transform.utils.window_spec.Window(partition_by, order_by, mode=None, window_definition=None)

Bases: object

Utility functions for defining a window specification.

Parameters
  • partition_by – he partitioning defined.

  • order_by – the ordering defined.

  • mode – available modes to be used in time aggregations.

  • window_definition – time ranges to be used in the windows, it can be second(s), minute(s), hour(s), day(s), week(s) and year(s),

Use the static methods in Window to create a WindowSpec.

SLIDE_DURATION = '1 day'
get()

Defines a common window to be used both in time and rows windows.

get_name()

Return window suffix name based on passed criteria.

Module contents

This module holds utils to be used by transformations.

class butterfree.transform.utils.Function(func: parameters_validation.parameter_validation_decorator.parameter_validation.<locals>.func_partial.<locals>.validation_partial, data_type: parameters_validation.parameter_validation_decorator.parameter_validation.<locals>.func_partial.<locals>.validation_partial)

Bases: object

Define a class Function.

Like a namedtuple:

Function = namedtuple(“Function”, [“function”, “data_type”]).

func

custom or spark functions, such as avg, std, count. For more information check spark functions:

For custom functions, look the path:

‘butterfree/transform/transformations/user_defined_functions’.

data_type

data type for the output columns.

property data_type

Function to be used in the transformation.

property func

Function to be used in the transformation.

class butterfree.transform.utils.Window(partition_by, order_by, mode=None, window_definition=None)

Bases: object

Utility functions for defining a window specification.

Parameters
  • partition_by – he partitioning defined.

  • order_by – the ordering defined.

  • mode – available modes to be used in time aggregations.

  • window_definition – time ranges to be used in the windows, it can be second(s), minute(s), hour(s), day(s), week(s) and year(s),

Use the static methods in Window to create a WindowSpec.

SLIDE_DURATION = '1 day'
get()

Defines a common window to be used both in time and rows windows.

get_name()

Return window suffix name based on passed criteria.