I am trying to get a daily count of distinct IDs in the last trailing 30 days.
The output should look like this:
DATE Count(Distinct ID) Trailing Last 30D Equation
2022-01-02 232 Distinct IDs in the last 30 days from 2022-01-02
2022-01-03 1231 [...]
2022-01-04 123
2022-01-05 242
2022-01-06 3432
However, from my query below - it does not return the correct distinct count of IDs from the last trailing 30 days. Can someone advise on how I should properly go about this? Is there a way to incorporate a window function to count the distinct count of IDs in the last trailing 30 days?
WITH DATE_SPINE AS (
SELECT d.DATE
FROM DATE d
WHERE d.DATE BETWEEN '2022-01-01' AND '2022-05-01'
)
SELECT d.DATE,
COUNT(DISTINCT IFF(u.EVENT_DATE >= d.DATE - INTERVAL '30 DAYS', ID, NULL)) AS TRAILING_30_MAU
FROM DATE_SPINE D
LEFT JOIN USERS u ON u.EVENT_DATE = d.DATE
GROUP BY 1
ORDER BY 1 DESC
CodePudding user response:
Join with a list of the dates between the_date-29
and the_date
, count:
with dates as (
select distinct o_orderdate a_date
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
)
select count(distinct o_custkey), a_date, count(distinct o_orderdate)
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS a
join dates b
on a.o_orderdate between b.a_date-29 and b.a_date
group by 2
order by 2
limit 100