Home > Software design >  How to split and aggregate days into different month
How to split and aggregate days into different month

Time:04-05

db fiddle

run select *, return_date - pickup_date as total from order_history order by id; return the following result:

id  pickup_date return_date date_ranges             total
1   2020-03-01  2020-03-12  [2020-03-01,2020-04-01) 11
2   2020-03-01  2020-03-22  [2020-03-01,2020-04-01) 21
3   2020-03-11  2020-03-22  [2020-03-01,2020-04-01) 11
4   2020-02-11  2020-03-22  [2020-02-01,2020-03-01) 40
5   2020-01-01  2020-01-22  [2020-01-01,2020-02-01) 21
6   2020-01-01  2020-04-22  [2020-01-01,2020-02-01) 112

for example:

--id=6. total = 112. 112 = 22  31   29   30
--therefore toal should split: jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.

first split then aggregate. aggregate based over range min(pickup_date), max(return_date) then tochar cast to 'YYYY-MM'; In this case the aggregate should group by 2020-01, 2020-02, 2020-03,2020-04.

but if pickup_date in the same month with return_date then compuate return_date - pickup_date then aggregate/sum the result, group by to_char(pickup_date,'YYYY-MM')

CodePudding user response:

step-by-step demo: db<>fiddle

Not quite perfect, but a sketch:

SELECT
    id,
    ARRAY_AGG(                                                       -- 4
        LEAST(return_date, gs   interval '1 month - 1 day')          -- 2
        - GREATEST(pickup_date, gs)                                  -- 3
          interval '1 day'              
    )
FROM order_history,
    generate_series(                                                 -- 1
       date_trunc('month', pickup_date), 
       date_trunc('month', return_date), 
       interval '1 month'
    ) gs
GROUP BY id
  1. Generate a set of months that are included in the given date range
  2. a) Calculate the last day of the month (first of a month 1 month is first of the next month; minus 1 day is last of the current month). This is the max day for returning in this month. b) if it happened earlier, then take the earler day (LEAST())
  3. Same for pickup day. Afterwards calculate the difference of the days kept in one month.
  4. Aggregate the values for one month.

Open questions / Potential enhancements:

  1. You said:

jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.

Why is JAN given with 30 days? On the other hand you count APR 22 days (1st - 22nd). Following the logic, JAN should be 31, shouldn't it?

If you don't want to count the very first day, then you can change (3.) to

GREATEST(pickup_date   interval '1 day', gs) 
  1. There's a problem with day saving time in March (30 days, 23 hours instead of 31 days). This can be faced by some rounding, for example.
  • Related