Home > OS >  SQL self join to get count difference between records
SQL self join to get count difference between records

Time:01-28

Pardon the title as I could not think of a good title for my problem.
I have a table as below

L_DATE GRP Counts
20.01.2023 A 100
21.01.2023 A 150
22.01.2023 B 200
20.01.2023 C 500
21.01.2023 C 800
22.01.2023 C 1200

The desired output is like this

GRP Current Count Last Count Diff1 Last2Last Count Diff2
A 0 150 -150 100 -100
B 200 0 200 0 200
C 1200 800 400 500 700

where,
Current Count is the count of latest date - 22.01.2023
Last Count is the count of previous date - 21.01.2023
Last2Last Count is the count of last to last date - 20.01.2023
Diff1 is the difference between Current Count and Last Count
Diff2 is the difference between Current Count and Last2Last Count
0 appears where there is no data for that date, for example A does not have any record for latest date 22.01.2023 so its 'Current Count' is 0. Similarly B does not have any record for 21.01.2023 or 20.01.2023 so its 'Last Count' and 'Last2Last Count' is 0.

I have tried all sorts of joins but cannot achieve the desired results. Below is my latest code, which gives me result of C and B but not A.

select distinct 
T1.GRP,
T1.Counts as "Current Count",
ifnull(T2.Counts,0) as "Last Count",
T1.Counts - T2.Counts as "Diff1",
ifnull(T3.Counts,0) as "Last2Last Count",
T1.Counts - T3.Counts as "Diff2"
from tbl T1
left  join tbl T2  on (T2.L_DATE = '21.01.2023' and T2.GRP = T1.GRP)
left  join tbl T3  on (T3.L_DATE = '20.01.2023' and T3.GRP = T1.GRP)
where T1.L_DATE = ('22.01.2023') 

I tried to achieve it via GROUP_BY but did not succeed. Any help or guidance is appreciated.

CodePudding user response:

Generate test data

CREATE TABLE TEST (L_DATE DATE, GRP VARCHAR(1), COUNTS INTEGER);
INSERT INTO TEST VALUES ('20.1.2023', 'A', 100);
INSERT INTO TEST VALUES ('21.1.2023', 'A', 150);
INSERT INTO TEST VALUES ('22.1.2023', 'B', 200);
INSERT INTO TEST VALUES ('20.1.2023', 'C', 500);
INSERT INTO TEST VALUES ('21.1.2023', 'C', 800);
INSERT INTO TEST VALUES ('22.1.2023', 'C', 1200);

Next you need to "fill the empty lines". For dates you may want to use SERIES_GENERATE instead, if not all dates are present in the data.

WITH expected_lines AS (
    SELECT DISTINCT a.L_DATE, b.GRP
    FROM TEST a, TEST b
)
SELECT el.L_DATE, el.GRP, ifnull(t.COUNTS, 0) AS COUNTS
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP

As you proposed, two self-joins based on this intermediate result would do the job. However, I would prefer to use window function LAG instead.

WITH expected_lines AS (
    SELECT DISTINCT a.L_DATE, b.GRP
    FROM TEST a, TEST b
)
SELECT 
    el.L_DATE, 
    el.GRP, 
    ifnull(t.COUNTS, 0) AS COUNTS,
    LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
    LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP

Note that this gives you the desired result also for historical dates. You can add a WHERE condition for the current date. Also you can additionally calculate the differences:

WITH expected_lines AS (
    SELECT DISTINCT a.L_DATE, b.GRP
    FROM TEST a, TEST b
)
SELECT L_DATE, GRP, COUNTS, LASTCOUNT, COUNTS-LASTCOUNT DIFF1, LAST2LASTCOUNT, COUNTS-LAST2LASTCOUNT DIFF2
FROM
(
    SELECT 
        el.L_DATE, 
        el.GRP, 
        ifnull(t.COUNTS, 0) AS COUNTS,
        LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
        LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
    FROM expected_lines el
    LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
)
WHERE L_DATE = '22.1.2023'
  • Related