Home > OS >  Converting event-wise table to timeseries
Converting event-wise table to timeseries

Time:10-12

I have an SQLite database (with Django as ORM) with a table of change events (an Account is assigned a new Strategy). I would like to convert it to a timeseries, to have on each day the Strategy the Account was following.

My table :

enter image description here

Expected output :

enter image description here

As showed, there can be more than 1 change per day. In this case I select the last change of the day, as the desired timeseries output must have only one value per day.

My question is similar to this one but in SQL, not BigQuery (but I'm not sure I understood the unnest part they propose). I have a working solution in Pandas with reindex and fillna, but I'm sure there is an elegant and simple solution in SQL (maybe even better with Django ORM).

CodePudding user response:

You can use a RECURSIVE Common Table Expression to generate all dates between first and last and then join this generated table with your data to get the needed value for each day:

WITH RECURSIVE daterange(d) AS (
  SELECT date(min(created_at)) from events 
  UNION ALL 
  SELECT date(d,'1 day') FROM daterange WHERE d<(select max(created_at) from events)
)
SELECT d, account_id, strategy_id
FROM daterange JOIN events
WHERE created_at = (select max(e.created_at) from events e where e.account_id=events.account_id and date(e.created_at) <= d)
GROUP BY account_id, d
ORDER BY account_id, d

date() function is used to convert a datetime value to a simple date, so you can use it to group your data by date. date(d, '1 day') applies a modifier of 1 calendar day to d.

Here is an example with your data:

CREATE TABLE events (
  created_at,
  account_id,
  strategy_id
);
insert into events 
VALUES ('2022-10-07 12:53:53', 4801323843, 7),
('2022-10-07 08:10:07', 4801323843, 5),
('2022-10-07 15:00:45', 4801323843, 8),
('2022-10-10 13:01:16', 4801323843, 6);
WITH RECURSIVE daterange(d) AS (
  SELECT date(min(created_at)) from events 
  UNION ALL 
  SELECT date(d,'1 day') FROM daterange WHERE d<(select max(created_at) from events)
)
SELECT d, account_id, strategy_id
FROM daterange JOIN events
WHERE created_at = (select max(e.created_at) from events e where e.account_id=events.account_id and date(e.created_at) <= d)
GROUP BY account_id, d
ORDER BY account_id, d
d account_id strategy_id
2022-10-07 4801323843 8
2022-10-08 4801323843 8
2022-10-09 4801323843 8
2022-10-10 4801323843 6
2022-10-11 4801323843 6

fiddle

The query could be slow with many rows. In that case create an index on the created_at column:

CREATE INDEX events_created_idx ON events(created_at);

CodePudding user response:

My final version is the version proposed by @Andrea B., with just a slight improve in performance, merging only the rows that we need in the join, and therefore discarding the where clause. I also converted the null to date('now')
Here is the final version I used :

with recursive daterange(day) as 
  (
    select min(date(created_at)) from events
    union all select date(day, '1 day') from daterange
    where day < date('now')
  ),
  events as (
    select account_id, strategy_id, created_at as start_date, 
    case lead(created_at) over(partition by account_id order by created_at) is null
      when True then datetime('now')
      else lead(created_at) over(partition by account_id  order by created_at) 
    end as end_date
    from events
  )
  select * from daterange
  join events on events.start_date<daterange.day and daterange.day<events.end_date
  order by events.account_id

Hope this helps !

  • Related