Home > Software engineering >  How to select data with an unusual grouping by date?
How to select data with an unusual grouping by date?

Time:11-23

There is a table:

id direction_id created_at
1 2 22 November 2021 г., 16:00:00
2 2 22 November 2021 г., 16:20:00
43 2 22 November 2021 г., 16:25:00
455 1 22 November 2021 г., 16:27:00
6567 2 22 November 2021 г., 17:36:00
674556 2 22 November 2021 г., 20:01:00
5243554 1 22 November 2021 г., 20:50:00
5243554 1 22 November 2021 г., 21:46:00

I need to get the following result:

1 2 created_at_by_hour
1 3 22.11.21 17
1 4 22.11.21 18
1 4 22.11.21 19
1 4 22.11.21 20
2 5 22.11.21 21
3 5 22.11.21 22

1 and 2 in the header are all possible values of direction_id that are in the table. created_at is reduced to hours and you need to count how many records satisfy the condition <= created_at_by_hour. But the grouping should be such that if the time (hour) when no records were created, then just duplicate the previous hour.

The table consists of three fields - id (int), direction_id (int), created_at (timestamptz). I need to get an hourly (based on the created_at field) data upload with the number of records created before this "grouped" time. But I need not just the number, but separately for each direction_id (there are only two of them - 1 and 2). If no records were created for a certain direction_id at a certain hour, duplicate the previous one, but the result should end at the last created_at. created_at is the time when the record was created.

CodePudding user response:

In my opinion, better to generate a date between min and max date according to an hour then calculate the count of each direction.

Demo

with time_range as (
  select 
    min(created_at)   interval '1 hour' as min, 
    max(created_at)   interval '1 hour' as max
  from test
)
select
  count(*) filter (where direction_id = 1) as "1",
  count(*) filter (where direction_id = 2) as "2",
  to_char(gs.hour, 'dd.mm.yy HH24') as created_at_by_hour
from 
  test t
  cross join time_range tr
  inner join generate_series(tr.min, tr.max, interval  '1 hour') gs(hour)
    on t.created_at <= gs.hour
group by gs.hour
order by gs.hour

CodePudding user response:

Truncate the date down to the hour, group by it and count. Then use SUM OVER to get a running total of the counts. In order to show missing hours in the table, you must generate a series of hours and outer join your data.

with hourly as
(
  select date_trunc('hour', created_at) as hour, direction_id from mytable
)
, hours(hour) as
(
  select *
  from generate_series
  (
    (select min(hour) from hourly), (select max(hour) from hourly), interval '1 hour'
  )
)
select
  hours.hour,
  sum(count(*) filter (where hourly.direction_id = 1)) over (order by hour) as "1",
  sum(count(*) filter (where hourly.direction_id = 2)) over (order by hour) as "2"
from hours
left join hourly using (hour)
group by hour
order by hour;

Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=21d0c838452a09feac4ebc57906829f4

  • Related