I have a dataset that has 5 columns. Each account can have multiple rows. I need to group the data by C2 and Monthyear and find counts
ACC_ID, C1 , C2, EFF_START_DATE, EFF_END_DATE
111 , 0 , A , 2018-01-01, 2499-12-31
222 , 0 , A , 2018-02-15 , 2018-03-15
222 , 0 , B , 2018-03-16, 2499-12-31
333 , 0, A, 2000-01-01, 2499-12-31
I need to group this by months and find count for each month. So if someone has 2018-01-01 as EFF_STA_DTE and 2499-12-31 as EFF_END_DATE. They should be a part of all the months starting 2018.
Similarly if someone has 2018-02-15 as EFF_STA_DTE and 2018-03-15 as EFF_END_DATE their count should only reflect for Feb and March 2018.
Also I am only trying to get a count starting 2018 even if eff_start_Date is in past. So 333 in above case will have count 1 in 2018 and henceforth
Tried to extract Month year and do the count based on eff_start_Date but that is giving incorrect result.
Expected Output in above case
MONTH, C2, COUNT
JAN-18, A, 2. -- FOR ACCOUNT 111 ,333
FEB-18, A , 3. -- FOR ACCOUNT 111,222,333
MARCH-18, A, 1 -- FOR ACCOUNT 111,222,333
MARCH-18, B, 1. -- FOR ACCOUNT 222
CodePudding user response:
The most efficient way utilizes Teradata's EXPAND ON
extension to Standard SQL:
WITH cte AS
(
SELECT -- first of month
Trunc(BEGIN(pd), 'mon') AS mon
,C2
FROM tab
-- create a period on-the-fly, adjust the end date as periods exclude the end
EXPAND ON PERIOD(EFF_START_DATE, Next(EFF_END_DATE)) AS pd
-- return one row per month
BY ANCHOR PERIOD MONTH_BEGIN
-- restrict output to a specifc range
FOR PERIOD (date '2018-01-01', date '2018-03-31')
)
SELECT mon, C2, Count(*)
FROM cte
GROUP BY 1,2
ORDER BY 1,2
;