Home > Enterprise >  Don't want to double count in Filtered Aggregation
Don't want to double count in Filtered Aggregation

Time:03-30

Sample Data:

shopper_id last_purchase_timestamp active_p30 active_p60 active_over_p90
1 2022-03-02 1:20:00 TRUE TRUE TRUE
2 2022-03-01 1:30:00 TRUE TRUE TRUE
3 2022-02-28 1:24:03 TRUE TRUE TRUE
4 2022-02-02 21:22:26 FALSE TRUE TRUE

I want to count if the shopper was active (as in made their last purchase) in the last 30 days (starting march 5th), last 60 days, etc.

My goal is to find how many shoppers bought their last item in the last 30 days, how many shoppers bought their last item in the last 60 days etc. However I do not want to double count.

What I've attempted:

AS total_active_p30,

count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day) 
AS total_active_p60,

count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS 
total_active_p90 

Results:

total_active_p30 total_active_p60 total_active_p90
3 4 4

However this is causing it to double count. How can I prevent it from double counting? The total number of counts should be 4.

My ideal output would be:

total_active_p30 total_active_p60 total_active_p90
3 1 0

Thanks in advance everyone! I'm using Trino!

CodePudding user response:

Your query has an incorrect logic condition. Because data of resulting this >= DATE 2022-03-05 - INTERVAL 90 day condition always are have data of resulting this >= DATE 2022-03-05 - INTERVAL 60 day condition too. For that, we must write our query:

count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day)) 
as total_active_p30,

count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
                            and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day)) 
as total_active_p60,

count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
                        and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day)) 
as total_active_p90

CodePudding user response:

Add both upper and lower bounds to the filter so they do not intersect. Something along this lines:

-- sample data
WITH dataset (last_purchase_timestamp) AS (
    VALUES (timestamp '2022-03-02 1:20:00'),
        (timestamp '2022-03-01 1:30:00'),
        (timestamp '2022-02-28 1:24:03'),
        (timestamp '2022-02-02 21:22:26')
)

-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
    count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
    count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90 
from dataset

Output:

total_active_p30 total_active_p60 total_active_p90
3 1 0
  • Related