I have a MySql db table with that looks like:
time_slot | sales
2022-08-026T01:00:00 | 100
2022-08-026T01:06:40 | 103
...
I am serving the data via api to a client. The FE engineer wants the data aggregated by hour for each day within the query period (atm it's a week). So he gives from
and to
and wants the sum of sales within each hour for each day as a nested array. Because it's a week, it's a 7 element array, where each element is an array containing all the hourly slots where we have data.
[
[
"07:00": 567,
"08:00": 657,
....
],
[], [], ...
]
The api is built in python. There is an ORM (sqlalchemy) for the data, that looks like:
class HourlyData(Base):
hour: Column(Datetime)
sales: Column(Float)
I can query the hourly data, and then in python memory aggregate it into list of lists. But to save compute time (and conceptual complexity), I would like to run the aggregation through orm queries.
What is the sqlalchemy syntax to achieve this?
CodePudding user response:
The below should get you started, where the solution is a mix of SQL and Python using existing tools, and it should work with any RDBMS.
- Assumed model definition, and imports
from itertools import groupby
import json
class TimelyData(Base):
__tablename__ = "timely_data"
id = Column(Integer, primary_key=True)
time_slot = Column(DateTime)
sales = Column(Float)
- We get the data from the DB aggregated enough for us to group properly
# below works for Posgresql (tested), and should work for MySQL as well
# see: https://mode.com/blog/date-trunc-sql-timestamp-function-count-on
col_hour = func.date_trunc("hour", TimelyData.time_slot)
q = (
session.query(
col_hour.label("hour"),
func.sum(TD.sales).label("total_sales"),
)
.group_by(col_hour)
.order_by(col_hour) # this is important for `groupby` function later on
)
- Group the results by date again using python
groupby
groups = groupby(q.all(), key=lambda row: row.hour.date())
# truncate and format the final list as required
data = [
[(f"{row.hour:%H}:00", int(row.total_sales)) for row in rows]
for _, rows in groups
]
- Example result:
[[["01:00", 201], ["02:00", 102]], [["01:00", 103]], [["08:00", 104]]]
I am not familiar with MySQL, but with Postgresql one could implement all at the DB level due to extensive JSON support. However, I would argue the readability of that implementation will not be improve, and so will not the speed assuming we get from the database at most 168 rows = 7 days x 24 hours).