I need to compute result for series of dates (i.e. ['2020-09-01', '2020-09-02', '2020-09-03']), this is SQL query that works for single date ('2020-09-01'), but I need to do it for multiple days. Maybe it is possible to iterate through dates, like in for loop, but not sure if this is efficient way. I was wondering maybe there is a way to use some sort of custom group by, single group is what we get from where statement for single day, but problem is that same record can be present in multiple different groups. Any ideas how to make such SQL query, which would be efficient as well? This needs to work for Redshift SQL.
NOTE: there can be up to a thousand days
Input data:
column_1 | column_2 | date_column_1 | date_column_2 |
---|---|---|---|
10 | 2 | '2020-07-01' | '2020-09-02' |
30 | 4 | '2019-08-02' | '2020-09-02' |
50 | 6 | '2020-08-03' | '2020-09-03' |
70 | 8 | '2019-08-04' | '2020-09-03' |
90 | 2 | '2020-07-05' | '2020-09-04' |
10 | 4 | '2019-08-06' | '2020-09-05' |
30 | 6 | '2020-08-07' | '2020-09-06' |
Expected result:
result_date | computed_result |
---|---|
'2020-09-01' | 36.9 |
'2020-09-02' | 44.8 |
'2020-09-03' | 39.3 |
WITH temp_table AS (
SELECT
column_1 - column_2 AS computed_column
FROM table_name
WHERE (date_column_1 > '2020-09-01' AND date_column_2 < '2020-09-01')
OR (date_column_1 < '2020-09-01' AND date_column_2 > '2020-09-01'))
SELECT
'2020-09-01' as result_date,
avg(computed_column) as computed_result
FROM temp_table
CodePudding user response:
You can try the following.
Generate the dates table first.
DROP TABLE IF EXISTS tmp_dates;
CREATE TEMP TABLE tmp_dates
AS
SELECT
'2020-09-01':: DATE AS i_date
UNION ALL
SELECT
'2020-09-02':: DATE AS i_date
UNION ALL
SELECT
'2020-09-03':: DATE AS i_date
;
Dummy data:
DROP TABLE IF EXISTS table_name;
CREATE TEMP TABLE table_name
AS
SELECT
5 AS column_1,
4 AS column_2,
'2020-09-01'::DATE AS date_column_1,
'2020-09-01'::DATE AS date_column_2
UNION ALL
SELECT
8 AS column_1,
4 AS column_2,
'2020-09-01'::DATE AS date_column_1,
'2020-09-03'::DATE AS date_column_2
UNION ALL
SELECT
6 AS column_1,
4 AS column_2,
'2020-09-03'::DATE AS date_column_1,
'2020-09-03'::DATE AS date_column_2
;
Query to calculate avg.
SELECT
d.i_date AS result_date,
AVG((column_1 - column_2)::FLOAT) AS computed_result
FROM
table_name t
JOIN tmp_dates d ON (t.date_column_1 >= d.i_date AND t.date_column_2 <= d.i_date)
OR (t.date_column_1 <= d.i_date AND t.date_column_2 >= d.i_date)
GROUP BY
result_date
ORDER BY
result_date
;
PS For dates generation, if the list of dates is huge (e.g. a whole year - 365 days). You may wanna consider using a date generator like (this).
-- generate every day between two timestamps
CALL generate_ts_every_x_between_two_timestamps('2020-01-01 00:00:00' , '2020-01-30 00:00:00', 1, 'day');
CREATE TEMP TABLE tmp_dates AS SELECT ts::DATE as i_date FROM tmp_generated_time_series;