Pardon the title as I could not think of a good title for my problem.
I have a table as below
L_DATE | GRP | Counts |
---|---|---|
20.01.2023 | A | 100 |
21.01.2023 | A | 150 |
22.01.2023 | B | 200 |
20.01.2023 | C | 500 |
21.01.2023 | C | 800 |
22.01.2023 | C | 1200 |
The desired output is like this
GRP | Current Count | Last Count | Diff1 | Last2Last Count | Diff2 |
---|---|---|---|---|---|
A | 0 | 150 | -150 | 100 | -100 |
B | 200 | 0 | 200 | 0 | 200 |
C | 1200 | 800 | 400 | 500 | 700 |
where,
Current Count is the count of latest date - 22.01.2023
Last Count is the count of previous date - 21.01.2023
Last2Last Count is the count of last to last date - 20.01.2023
Diff1 is the difference between Current Count and Last Count
Diff2 is the difference between Current Count and Last2Last Count
0 appears where there is no data for that date, for example A does not have any record for latest date 22.01.2023 so its 'Current Count' is 0. Similarly B does not have any record for 21.01.2023 or 20.01.2023 so its 'Last Count' and 'Last2Last Count' is 0.
I have tried all sorts of joins but cannot achieve the desired results. Below is my latest code, which gives me result of C and B but not A.
select distinct
T1.GRP,
T1.Counts as "Current Count",
ifnull(T2.Counts,0) as "Last Count",
T1.Counts - T2.Counts as "Diff1",
ifnull(T3.Counts,0) as "Last2Last Count",
T1.Counts - T3.Counts as "Diff2"
from tbl T1
left join tbl T2 on (T2.L_DATE = '21.01.2023' and T2.GRP = T1.GRP)
left join tbl T3 on (T3.L_DATE = '20.01.2023' and T3.GRP = T1.GRP)
where T1.L_DATE = ('22.01.2023')
I tried to achieve it via GROUP_BY but did not succeed. Any help or guidance is appreciated.
CodePudding user response:
Generate test data
CREATE TABLE TEST (L_DATE DATE, GRP VARCHAR(1), COUNTS INTEGER);
INSERT INTO TEST VALUES ('20.1.2023', 'A', 100);
INSERT INTO TEST VALUES ('21.1.2023', 'A', 150);
INSERT INTO TEST VALUES ('22.1.2023', 'B', 200);
INSERT INTO TEST VALUES ('20.1.2023', 'C', 500);
INSERT INTO TEST VALUES ('21.1.2023', 'C', 800);
INSERT INTO TEST VALUES ('22.1.2023', 'C', 1200);
Next you need to "fill the empty lines". For dates you may want to use SERIES_GENERATE instead, if not all dates are present in the data.
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT el.L_DATE, el.GRP, ifnull(t.COUNTS, 0) AS COUNTS
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
As you proposed, two self-joins based on this intermediate result would do the job. However, I would prefer to use window function LAG instead.
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT
el.L_DATE,
el.GRP,
ifnull(t.COUNTS, 0) AS COUNTS,
LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
Note that this gives you the desired result also for historical dates. You can add a WHERE condition for the current date. Also you can additionally calculate the differences:
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT L_DATE, GRP, COUNTS, LASTCOUNT, COUNTS-LASTCOUNT DIFF1, LAST2LASTCOUNT, COUNTS-LAST2LASTCOUNT DIFF2
FROM
(
SELECT
el.L_DATE,
el.GRP,
ifnull(t.COUNTS, 0) AS COUNTS,
LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
)
WHERE L_DATE = '22.1.2023'