R's ceiling_date equivalent in SQL-CodePudding

I want to implement R's ceiling_date fucntion in SQL (Postgresql).

So I have dates in a column for everyday with corresponding sales and I want to accumulate the sales for a week over a single date (say Friday).

Input Format:

Dates in yellow are the dates to aggregate sales on

Expected output format:

This can easily be done in R using ceiling_date but I want to do it in SQL itself.

Any help would be appreciated. Thanks

CodePudding user response：

Accepting and processing the ISO 8601 Standard is by far the easiest for processing date ranges. But this imposes a standard definition, which is essentially:

All weeks consist on exactly 7 days.
All weeks begin on Monday.
The first week of the year is the week the contains 4-Jan.

The date_trunc function gives the first date of the week, adding 6 gives the last day of the week.

-- ISO 8601 Week definition 
select (date_trunc('week',dte)::date  6)  "Week Ending"
     , sum(sales)                         "Total Sales"
  from test    
  group by (date_trunc('week',dte)::date  6)
  order by (date_trunc('week',dte)::date  6);

Date/Week processing for non ISO 8601 presents somewhat tricky process to get the appropriate week definition. The following does so for week Friday - Thursday definition. It creates a date range for a year beginning with the first Friday in the table, then joins using the range contains operator to determine the appropriate summation period

with periods (wk) as 
     ( select daterange( ((min_dt   (n-1) * interval '1 week'))::date     
                       , ((min_dt   (n)   * interval '1 week'))::date 
                       , '(]'
                       ) 
        from (select min(dte) min_dt 
                from test
               where extract(dow from dte) = 5     --- Day_Of_Week (5) = Friday
             ) s
        cross join generate_series(0,52) gs(n) 
     ) --select * from periods;
select upper(wk)-1   "Week Ending" 
    ,  sum(sales)    "Total Sales"
  from periods
  join test 
    on (dte <@ wk)
group by upper(wk)-1 
order by upper(wk)-1;

See demo of both here. NOTE: Demo changes sample date from January (2022-01-01 ...) to May (2022-05-01 ...) as 6-January-2022 was Thursday not Friday as description, 6-May-2022 is however Friday. Also the sum of values ending 6-May is 38 (not 42 as indicated). Finally, neither query attempts a limiting date, but processed through end-of-data. Nor does either address multiple years of data.

CodePudding user response：

demo

idea: for 2022-Janurary-1 to 2022-Janurary-20, there is 3 Fridays:'2022-01-07','2022-01-14', '2022-01-21'. We need to partition by these 3 friday order by sales date. Now the problem is now to compute get all these date belong to these 3 fridays.

get every friday each sales_date belong to.
deal with special cases(one week after friday: saturday, sunday) when sales_date > friday then the real friday is next friday.

final code:

SELECT
    *,
    sum(amount) OVER (PARTITION BY sales.compute_friday ORDER BY sales_date)
FROM
    sales;

processing code:

BEGIN;
CREATE TABLE sales (
    sales_date date
    , amount numeric
);
INSERT INTO sales (sales_date , amount)
SELECT
    i
    , (random() * 10)::integer
FROM
    generate_series('2022-01-01'::timestamp , '2022-01-20'::timestamp , interval '1 day') g (i);
ALTER TABLE sales
    ADD COLUMN friday date;
UPDATE
    sales
SET
    friday = (date_trunc('week' , sales_date)   interval '4 day')::date;
ALTER TABLE sales
    ADD COLUMN compute_friday date;
UPDATE
    sales
SET
    compute_friday = CASE WHEN sales_date > friday THEN
        (friday   interval '7 days')::date
    ELSE
        friday
    END;
COMMIT;