Scenario: Trying to count more active users for time series analysis.
Need: With postgreSQL(redshift) Count customers that have more than X unique transactions within Y days from said date, group by date.
How do i achieve this?
Table: orders
date | user_id | product_id | transaction_id |
---|---|---|---|
2022-01-01 | 001 | 003 | 001 |
2022-01-02 | 002 | 001 | 002 |
2022-03-01 | 003 | 001 | 003 |
2022-03-01 | 003 | 002 | 003 |
... | ... | ... | ... |
Outcome:
date | active_customers |
---|---|
2022-01-01 | 10 |
2022-01-02 | 12 |
2022-01-03 | 9 |
2022-01-04 | 13 |
CodePudding user response:
You may be able to use the window functions LEAD()
and LAG()
here but this solution may also work for you.
WITH data AS
(
SELECT o.date
, o.user_id
, COUNT(o.trans_id) tcount
FROM orders o
WHERE o.date BETWEEN o.date - '30 DAYS'::INTERVAL AND o.date -- Y days from given date
GROUP BY o.date, o.user_id
), user_transaction_count AS
(
SELECT d.date
, COUNT(d.user_id) FILTER (WHERE d.tcount > 1) -- X number of transactions
OVER (PARTITION BY d.user_id) user_count
FROM data d
)
SELECT u.date
, SUM(u.user_count) active_customers
FROM user_transaction_count u
GROUP BY u.date
ORDER BY u.date
;
Here is a DBFiddle that demos a couple options.