Don't want to double count in Filtered Aggregation-CodePudding

Sample Data:

shopper_id	last_purchase_timestamp	active_p30	active_p60	active_over_p90
1	2022-03-02 1:20:00	TRUE	TRUE	TRUE
2	2022-03-01 1:30:00	TRUE	TRUE	TRUE
3	2022-02-28 1:24:03	TRUE	TRUE	TRUE
4	2022-02-02 21:22:26	FALSE	TRUE	TRUE

I want to count if the shopper was active (as in made their last purchase) in the last 30 days (starting march 5th), last 60 days, etc.

My goal is to find how many shoppers bought their last item in the last 30 days, how many shoppers bought their last item in the last 60 days etc. However I do not want to double count.

What I've attempted:

AS total_active_p30,

count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day) 
AS total_active_p60,

count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS 
total_active_p90

Results:

total_active_p30	total_active_p60	total_active_p90
3	4	4

However this is causing it to double count. How can I prevent it from double counting? The total number of counts should be 4.

My ideal output would be:

total_active_p30	total_active_p60	total_active_p90
3	1	0

Thanks in advance everyone! I'm using Trino!

CodePudding user response：

Your query has an incorrect logic condition. Because data of resulting this >= DATE 2022-03-05 - INTERVAL 90 day condition always are have data of resulting this >= DATE 2022-03-05 - INTERVAL 60 day condition too. For that, we must write our query:

count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day)) 
as total_active_p30,

count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
                            and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day)) 
as total_active_p60,

count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
                        and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day)) 
as total_active_p90

CodePudding user response：

Add both upper and lower bounds to the filter so they do not intersect. Something along this lines:

-- sample data
WITH dataset (last_purchase_timestamp) AS (
    VALUES (timestamp '2022-03-02 1:20:00'),
        (timestamp '2022-03-01 1:30:00'),
        (timestamp '2022-02-28 1:24:03'),
        (timestamp '2022-02-02 21:22:26')
)

-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
    count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
    count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90 
from dataset

Output:

total_active_p30	total_active_p60	total_active_p90
3	1	0