Home > Back-end >  SQL query for efficient custom grouping
SQL query for efficient custom grouping

Time:07-22

I need to compute result for series of dates (i.e. ['2020-09-01', '2020-09-02', '2020-09-03']), this is SQL query that works for single date ('2020-09-01'), but I need to do it for multiple days. Maybe it is possible to iterate through dates, like in for loop, but not sure if this is efficient way. I was wondering maybe there is a way to use some sort of custom group by, single group is what we get from where statement for single day, but problem is that same record can be present in multiple different groups. Any ideas how to make such SQL query, which would be efficient as well? This needs to work for Redshift SQL.

NOTE: there can be up to a thousand days

Input data:

column_1 column_2 date_column_1 date_column_2
10 2 '2020-07-01' '2020-09-02'
30 4 '2019-08-02' '2020-09-02'
50 6 '2020-08-03' '2020-09-03'
70 8 '2019-08-04' '2020-09-03'
90 2 '2020-07-05' '2020-09-04'
10 4 '2019-08-06' '2020-09-05'
30 6 '2020-08-07' '2020-09-06'

Expected result:

result_date computed_result
'2020-09-01' 36.9
'2020-09-02' 44.8
'2020-09-03' 39.3
WITH temp_table AS (
    SELECT 
    column_1 - column_2 AS computed_column
    FROM table_name 
    WHERE (date_column_1 > '2020-09-01' AND date_column_2 < '2020-09-01') 
    OR (date_column_1 < '2020-09-01' AND date_column_2 > '2020-09-01')) 
    SELECT
    '2020-09-01' as result_date,
    avg(computed_column) as computed_result
    FROM temp_table 

CodePudding user response:

You can try the following.

Generate the dates table first.

DROP TABLE IF EXISTS tmp_dates;
CREATE TEMP TABLE tmp_dates
AS
SELECT
  '2020-09-01':: DATE AS i_date
UNION ALL
SELECT
  '2020-09-02':: DATE AS i_date
UNION ALL
SELECT
  '2020-09-03':: DATE AS i_date
;

Dummy data:

DROP TABLE IF EXISTS table_name;
CREATE TEMP TABLE table_name
AS
SELECT
  5                  AS column_1,
  4                  AS column_2,
  '2020-09-01'::DATE AS date_column_1,
  '2020-09-01'::DATE AS date_column_2
UNION ALL
SELECT
  8                  AS column_1,
  4                  AS column_2,
  '2020-09-01'::DATE AS date_column_1,
  '2020-09-03'::DATE AS date_column_2
UNION ALL
SELECT
  6                  AS column_1,
  4                  AS column_2,
  '2020-09-03'::DATE AS date_column_1,
  '2020-09-03'::DATE AS date_column_2
;

Query to calculate avg.

SELECT
  d.i_date                          AS result_date,
  AVG((column_1 - column_2)::FLOAT) AS computed_result
FROM
  table_name t
  JOIN tmp_dates d ON (t.date_column_1 >= d.i_date AND t.date_column_2 <= d.i_date)
    OR (t.date_column_1 <= d.i_date AND t.date_column_2 >= d.i_date)
GROUP BY
  result_date
ORDER BY
  result_date
;

PS For dates generation, if the list of dates is huge (e.g. a whole year - 365 days). You may wanna consider using a date generator like (this).

-- generate every day between two timestamps
CALL generate_ts_every_x_between_two_timestamps('2020-01-01 00:00:00' , '2020-01-30 00:00:00', 1, 'day');
CREATE TEMP TABLE tmp_dates AS SELECT ts::DATE as i_date FROM tmp_generated_time_series;
  • Related