Home > Back-end >  Get first row per group, plus related row from last year
Get first row per group, plus related row from last year

Time:10-21

Basically I have the following dataset coming from a query, the 'rn' is a partition to add row numbers for the next requirement:

WITH dataset AS (...)
-- gives

business_id | amount |  year |  month |   row_num
--------------------------------------------
1000          2000      2021      9       1
1000          1000      2021      9       2
1000          1500      2021      8       3
1000          1000      2020      9       4
1000          1000      2020      8       5
1043          4000      2021      8       1
1043          4500      2021      7       2
1043          4000      2021      6       3
1043          4100      2021      5       4
1043          4200      2021      4       5
1043          5000      2020      12      6
1043          5500      2020      11      7
1043          5600      2020      10      8
1043          5100      2020      9       9
1043          5300      2020      8       10
1043          5100      2020      7       11
1043          5000      2020      6       12
1139          4000      2021      9       1
1139          3000      2021      8       2
1139          2000      2021      7       3
1139          1000      2020      6       4

My first requirement is to get the latest year / month value for each business, which I can do by selecting rn = 1, like so:

WITH dataset AS (...)
SELECT * FROM dataset WHERE row_num = 1
-- gives

business_id | amount |  year |  month | row_num
-----------------------------------------------
1000          2000      2021      9       1
1043          4000      2021      8       1
1139          4000      2021      9       1

So far so good, the second requirement, which I'm not sure how to do, is to get the same record from last year to compare the current year to, so I would need this data set:

business_id | amount |  year |  month | row_num
-----------------------------------------------
1000          2000      2021      9       1
1000          1000      2020      9       4
1043          4000      2021      8       1
1043          5300      2020      8       10
1139          4000      2021      9       1

Any ideas how to do this? Notice the last record does not have a value for 2020, that is also a case.

CodePudding user response:

WITH dataset AS (...)
   , base    AS (SELECT * FROM dataset WHERE row_num = 1)
TABLE base
UNION ALL
SELECT d.*
FROM   base b
JOIN   dataset d ON d.business_id = b.business_id
                AND d.year = b.year - 1
                AND d.month = b.month

I added another CTE, and refer to it twice in the outer SELECT, using UNION ALL. You can add any ORDER BY you want, maybe ...

ORDER BY business_id, row_num

UNION ALL only adds the related row from last year if it exists.

Chances are, the query can be optimized. Getting the first row with row_number() is typically not king of performance. See:

About the TABLE shorthand:

  • Related