Skip to content

retrieve_processed_data

Retrieves parquet data from s3 bucket, given a key

Args:
    bucket_name (string)
    object_key (string)
    s3_client (boto3 s3 client)

Raises:
    KeyError: object does not exist
    ConnectionError : connection issue to parameter store


Returns:
    table_data (pandas dataframe)

write_table_data_to_warehouse

Write pandas dataframe to database

Args:
    data_frame (pd.DataFrame)
    table_name (string)
    db (pg8000 Connection)

Returns:
    response: database response

create_dim_date

Creates a table of dates in the given range
with columns for:
    - year
    - month
    - month name
    - day of month
    - day of year
    - day of week
    - quarter


Args:
    start_date (datetime)
    end_date (datetime)

Returns:
    dates (pandas dataframe)

get_timestamp

Return timestamp showing most recent entry from the given table
that has been processed by the ingestion lambda.

Args:
    table_name (str): table name to get timestamp for
    client (boto3 SSM Client)

Raises:
    KeyError: table_name does not exist
    ConnectionError : connection issue to parameter store

Returns:
    timestamp (datetime timestamp) : stored timestamp of most recent
    ingested data for given table

write_timestamp

Writes timestamp to parameter store for given table

Args:
    timestamp (timestamp) : timestamp of latest extracted data
    table_name (str) : table name to store timestamp for
    client (boto3 SSM Client) : client passed in to avoid recreating for each invocation

Raises:
    ConnectionError : connection issue to parameter store

Returns:
    None