Home > other >  SQL - Get historic count of rows collected within a certain period by date
SQL - Get historic count of rows collected within a certain period by date

Time:01-27

For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this

id dates
1 2002-01-01
2 2002-01-01
3 2002-01-01
... ...
3 2023-01-10

If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this

with total_counter as (
select id, count(id) counts
    from source
    group by id
),
unique_obs as (
    select id
    from source
    where dates >= DATEADD(Day ,-30, current_date)
    group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;

The problem is that this results would return a single result for today's count as provided by current_date.

I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like

counts date
1235 2023-01-10
1234 2023-01-09
1265 2023-01-08
... ...
7383 2022-12-11

so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.

CodePudding user response:

SELECT count(*) AS Counts
    dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC

CodePudding user response:

If you need a distinct count of Ids from the 30 days up to and including each date the below should work

WITH CTE_DATES
AS
(
    --Create a list of anchor dates
    SELECT  DISTINCT
            dates
    FROM    source
)

SELECT       COUNT(DISTINCT s.id)   AS "counts"
            ,D.dates                AS "date"

FROM        CTE_DATES   D
LEFT JOIN   source      S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE

GROUP BY    D.dates
ORDER BY    D.dates DESC
;

If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:

SELECT  S.dates     AS "date"
        ,COUNT(1)   AS "count_daily" 
        ,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day. 

FROM    source S

GROUP BY S.dates
ORDER BY S.dates DESC;
;

This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.

  • Related