Sample Data:
shopper_id | last_purchase_timestamp | active_p30 | active_p60 | active_over_p90 |
---|---|---|---|---|
1 | 2022-03-02 1:20:00 | TRUE | TRUE | TRUE |
2 | 2022-03-01 1:30:00 | TRUE | TRUE | TRUE |
3 | 2022-02-28 1:24:03 | TRUE | TRUE | TRUE |
4 | 2022-02-02 21:22:26 | FALSE | TRUE | TRUE |
I want to count if the shopper was active (as in made their last purchase) in the last 30 days (starting march 5th), last 60 days, etc.
My goal is to find how many shoppers bought their last item in the last 30 days, how many shoppers bought their last item in the last 60 days etc. However I do not want to double count.
What I've attempted:
AS total_active_p30,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day)
AS total_active_p60,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS
total_active_p90
Results:
total_active_p30 | total_active_p60 | total_active_p90 | |
---|---|---|---|
3 | 4 | 4 |
However this is causing it to double count. How can I prevent it from double counting? The total number of counts should be 4.
My ideal output would be:
total_active_p30 | total_active_p60 | total_active_p90 | |
---|---|---|---|
3 | 1 | 0 |
Thanks in advance everyone! I'm using Trino!
CodePudding user response:
Your query has an incorrect logic condition. Because data of resulting this >= DATE 2022-03-05 - INTERVAL 90 day
condition always are have data of resulting this >= DATE 2022-03-05 - INTERVAL 60 day
condition too. For that, we must write our query:
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p30,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p60,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day))
as total_active_p90
CodePudding user response:
Add both upper and lower bounds to the filter so they do not intersect. Something along this lines:
-- sample data
WITH dataset (last_purchase_timestamp) AS (
VALUES (timestamp '2022-03-02 1:20:00'),
(timestamp '2022-03-01 1:30:00'),
(timestamp '2022-02-28 1:24:03'),
(timestamp '2022-02-02 21:22:26')
)
-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90
from dataset
Output:
total_active_p30 | total_active_p60 | total_active_p90 |
---|---|---|
3 | 1 | 0 |