I have a data set of monthly billing records, but I want to filter out newer accounts that don't have a 12 month history and I'm not really sure how to approach this.
I was thinking about something using row number = 12 but not sure how to do that at the level of detail for each account and not the entire dataset. Also saw a post here about having count(date) = 12, but I don't think I'm executing that correctly.
CodePudding user response:
You use the windowing capability of aggregate functions, specifying a PARTITION BY.
SELECT *
FROM (SELECT x.*,
COUNT(*) OVER (PARTITION BY account_no) cnt
FROM monthly_billing_records x)
WHERE cnt >= 12
What Oracle will do is generate that query block results, then make a second internal pass over those results and do a GROUP BY whatever your PARTITION BY column list is, compute the COUNT(*), and join that back to the results on the PARTITION BY columns. That way you can summarize multiple rows and report the result on each individual row. It can do a lot more complicated stuff than that, so read up on PARTITION BY, ROWS BETWEEN , etc..
CodePudding user response:
If you want the id
for accounts with 12 months of data (the current month and the preceding 11 months) then you can GROUP BY
the account's id
and COUNT
the DISTINCT
dates within each month (WHERE
the dates are within the expected time period) and then to only get those with a full 12 month history you can use a HAVING
clause:
SELECT id,
COUNT(DISTINCT TRUNC(dt, 'MM')) AS num_months
FROM table_name
WHERE dt < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), 1)
AND dt >= ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -11)
GROUP BY id
HAVING COUNT(DISTINCT TRUNC(dt, 'MM')) = 12;
Which, for the sample data:
CREATE TABLE table_name (id, dt) AS
-- Every month for the current month and past 11 months
SELECT 1, ADD_MONTHS(TRUNC(SYSDATE, 'MM'), 1 - LEVEL) FROM DUAL CONNECT BY LEVEL <= 12
UNION ALL
-- 12 months of historical data but not for the current month
SELECT 2, ADD_MONTHS(TRUNC(SYSDATE, 'MM'), 0 - LEVEL) FROM DUAL CONNECT BY LEVEL <= 12
UNION ALL
-- Only 4 months of data
SELECT 3, ADD_MONTHS(TRUNC(SYSDATE, 'MM'), 1 - LEVEL) FROM DUAL CONNECT BY LEVEL <= 4;
Outputs:
ID | NUM_MONTHS |
---|---|
1 | 12 |