Home > Software engineering >  Distinct New Customers in SQL
Distinct New Customers in SQL

Time:01-23

The table below has customer ID, product purchased, and purchase date.

customer_id products    purchase_date
93738117783 product a   5/24/2022
93738117783 product a   6/8/2022
93738117783 product a   7/19/2022
93738117783 product a   8/18/2022
93738117783 product a   9/22/2022
93738117783 product a   10/19/2022
93738117783 product a   11/17/2022
93738117783 product a   12/27/2022
93738554027 product a   5/5/2021
93738738408 product b   8/2/2021
93738738408 product b   9/20/2021
93738738408 product b   10/26/2021
93738738408 product b   12/2/2021
93738738408 product b   1/2/2022
93738738408 product b   3/27/2022
93738738408 product b   5/2/2022
93738738408 product b   6/10/2022
93738738408 product b   7/8/2022
93738738408 product b   7/31/2022
93738117783 product a   8/1/2022
93738117783 product a   9/5/2022
93738117783 product a   10/8/2022
93738117783 product a   11/16/2022
93738117783 product a   12/19/2022
93738943799 product a   10/21/2020
93738943799 product a   11/20/2020
93738943799 product a   1/24/2021
93739310547 product b   5/3/2022
93739310547 product b   8/19/2022
93739310547 product b   1/5/2023

From this table, I want to create a SQL query to get the following output -

product a   new_customers   week_ending_Friday
product a   2               12/16/2022
product b   3               12/16/2022
product a   1               12/10/2022
product b   4               12/10/2022

new_customers = new to purchasing the product in 1yr from the purchase_date week_ending_Friday = date rolled upto week ending friday

Any idea will help me.

CodePudding user response:

First step: get a date's week's Friday. By ISO definition a week starts on Monday, so you'll have to add 4 days to the ISO week's start day to get Friday.

DATE_TRUNC('weekiso', purchase_date)   INTERVAL '4 day'

Then, in order to find new customers, we would get the first purchse day per customer and product, a simple aggregation, but in your comments you say that you also consider a customer new, when their last puchase was made more than a year before. So, we need a lookup, which we do with [NOT] EXISTS.

select
  day as week_ending_friday,
  product,
  count_if(is_new) as new_customers
from
(
  select
    customer_id,
    product,
    date_trunc('weekiso', purchase_date)   interval '4 day' as day,
    not exists
    (
      select null
      from purchases pp
      where pp.customer_id = p.customer_id
      and pp.product = p.product
      and pp.purchase_date < p.purchase_date
      and pp.purchase_date >= p.purchase_date - interval '1 year'
    ) as is_new
  from purchases p
) evaluated
group by day, product
order by day, product;

CodePudding user response:

enter image description here

with cte as (select 93738117783 customer_id, 'product a' products,   '5/24/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '6/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '7/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '8/18/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '9/22/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '10/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '11/17/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '12/27/2022'::date purchase_date
union all select 93738554027 customer_id, 'product a' products,   '5/5/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '8/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '9/20/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '10/26/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '12/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '1/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '3/27/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '5/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '6/10/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '7/8/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '7/31/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '8/1/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '9/5/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '10/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '11/16/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '12/19/2022'::date purchase_date
union all select 93738943799 customer_id, 'product a' products,   '10/21/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products,   '11/20/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products,   '1/24/2021'::date purchase_date
union all select 93739310547 customer_id, 'product b' products,   '5/3/2022'::date purchase_date
union all select 93739310547 customer_id, 'product b' products,   '8/19/2020'::date purchase_date
union all select 93739310547 customer_id, 'product b' products,   '1/5/2023'::date purchase_date)

 select   
  DATE_TRUNC('week', purchase_date)   INTERVAL '4 day' week_number
 ,products
 ,array_unique_agg(customer_id)over (partition by week_number, products) new_custs
 ,array_size(new_custs) number_new_custs
from cte 
qualify 
    datediff(day,coalesce(lag(purchase_date)over(partition by customer_id,products order by purchase_date) 365,purchase_date),purchase_date)>=0 
   
  • Related