The table below has customer ID, product purchased, and purchase date.
customer_id products purchase_date
93738117783 product a 5/24/2022
93738117783 product a 6/8/2022
93738117783 product a 7/19/2022
93738117783 product a 8/18/2022
93738117783 product a 9/22/2022
93738117783 product a 10/19/2022
93738117783 product a 11/17/2022
93738117783 product a 12/27/2022
93738554027 product a 5/5/2021
93738738408 product b 8/2/2021
93738738408 product b 9/20/2021
93738738408 product b 10/26/2021
93738738408 product b 12/2/2021
93738738408 product b 1/2/2022
93738738408 product b 3/27/2022
93738738408 product b 5/2/2022
93738738408 product b 6/10/2022
93738738408 product b 7/8/2022
93738738408 product b 7/31/2022
93738117783 product a 8/1/2022
93738117783 product a 9/5/2022
93738117783 product a 10/8/2022
93738117783 product a 11/16/2022
93738117783 product a 12/19/2022
93738943799 product a 10/21/2020
93738943799 product a 11/20/2020
93738943799 product a 1/24/2021
93739310547 product b 5/3/2022
93739310547 product b 8/19/2022
93739310547 product b 1/5/2023
From this table, I want to create a SQL query to get the following output -
product a new_customers week_ending_Friday
product a 2 12/16/2022
product b 3 12/16/2022
product a 1 12/10/2022
product b 4 12/10/2022
new_customers = new to purchasing the product in 1yr from the purchase_date week_ending_Friday = date rolled upto week ending friday
Any idea will help me.
CodePudding user response:
First step: get a date's week's Friday. By ISO definition a week starts on Monday, so you'll have to add 4 days to the ISO week's start day to get Friday.
DATE_TRUNC('weekiso', purchase_date) INTERVAL '4 day'
Then, in order to find new customers, we would get the first purchse day per customer and product, a simple aggregation, but in your comments you say that you also consider a customer new, when their last puchase was made more than a year before. So, we need a lookup, which we do with [NOT] EXISTS
.
select
day as week_ending_friday,
product,
count_if(is_new) as new_customers
from
(
select
customer_id,
product,
date_trunc('weekiso', purchase_date) interval '4 day' as day,
not exists
(
select null
from purchases pp
where pp.customer_id = p.customer_id
and pp.product = p.product
and pp.purchase_date < p.purchase_date
and pp.purchase_date >= p.purchase_date - interval '1 year'
) as is_new
from purchases p
) evaluated
group by day, product
order by day, product;
CodePudding user response:
with cte as (select 93738117783 customer_id, 'product a' products, '5/24/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '6/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '7/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '8/18/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '9/22/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '10/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '11/17/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '12/27/2022'::date purchase_date
union all select 93738554027 customer_id, 'product a' products, '5/5/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '8/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '9/20/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '10/26/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '12/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '1/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '3/27/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '5/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '6/10/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '7/8/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '7/31/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '8/1/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '9/5/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '10/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '11/16/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '12/19/2022'::date purchase_date
union all select 93738943799 customer_id, 'product a' products, '10/21/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products, '11/20/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products, '1/24/2021'::date purchase_date
union all select 93739310547 customer_id, 'product b' products, '5/3/2022'::date purchase_date
union all select 93739310547 customer_id, 'product b' products, '8/19/2020'::date purchase_date
union all select 93739310547 customer_id, 'product b' products, '1/5/2023'::date purchase_date)
select
DATE_TRUNC('week', purchase_date) INTERVAL '4 day' week_number
,products
,array_unique_agg(customer_id)over (partition by week_number, products) new_custs
,array_size(new_custs) number_new_custs
from cte
qualify
datediff(day,coalesce(lag(purchase_date)over(partition by customer_id,products order by purchase_date) 365,purchase_date),purchase_date)>=0