Home > Net >  Using SQLalchemy ORM for Python In my REST api, how can I aggregate resources to the hour to the day
Using SQLalchemy ORM for Python In my REST api, how can I aggregate resources to the hour to the day

Time:08-30

I have a MySql db table with that looks like:

time_slot            | sales
2022-08-026T01:00:00 | 100
2022-08-026T01:06:40 | 103
...

I am serving the data via api to a client. The FE engineer wants the data aggregated by hour for each day within the query period (atm it's a week). So he gives from and to and wants the sum of sales within each hour for each day as a nested array. Because it's a week, it's a 7 element array, where each element is an array containing all the hourly slots where we have data.

[
 [
    "07:00": 567,
    "08:00": 657,
    ....
 ],
 [], [],  ...
]

The api is built in python. There is an ORM (sqlalchemy) for the data, that looks like:

class HourlyData(Base):
    hour: Column(Datetime)
    sales: Column(Float) 

I can query the hourly data, and then in python memory aggregate it into list of lists. But to save compute time (and conceptual complexity), I would like to run the aggregation through orm queries.

What is the sqlalchemy syntax to achieve this?

CodePudding user response:

The below should get you started, where the solution is a mix of SQL and Python using existing tools, and it should work with any RDBMS.

  1. Assumed model definition, and imports
from itertools import groupby
import json

class TimelyData(Base):
    __tablename__ = "timely_data"
    id = Column(Integer, primary_key=True)
    time_slot = Column(DateTime)
    sales = Column(Float)
  1. We get the data from the DB aggregated enough for us to group properly
# below works for Posgresql (tested), and should work for MySQL as well
# see: https://mode.com/blog/date-trunc-sql-timestamp-function-count-on
col_hour = func.date_trunc("hour", TimelyData.time_slot)

q = (
    session.query(
        col_hour.label("hour"),  
        func.sum(TD.sales).label("total_sales"),
    )
    .group_by(col_hour)
    .order_by(col_hour)  # this is important for `groupby` function later on
)
  1. Group the results by date again using python groupby
groups = groupby(q.all(), key=lambda row: row.hour.date())

# truncate and format the final list as required
data = [
    [(f"{row.hour:%H}:00", int(row.total_sales)) for row in rows]
    for _, rows in groups
]
  1. Example result: [[["01:00", 201], ["02:00", 102]], [["01:00", 103]], [["08:00", 104]]]

I am not familiar with MySQL, but with Postgresql one could implement all at the DB level due to extensive JSON support. However, I would argue the readability of that implementation will not be improve, and so will not the speed assuming we get from the database at most 168 rows = 7 days x 24 hours).

  • Related