I have a dataset that's just a list of orders made by customers each day.
order_date | month | week | customer |
---|---|---|---|
2022-10-06 | 10 | 40 | Paul |
2022-10-06 | 10 | 40 | Edward |
2022-10-01 | 10 | 39 | Erick |
2022-09-26 | 9 | 39 | Divine |
2022-09-23 | 9 | 38 | Alice |
2022-09-21 | 9 | 38 | Evelyn |
My goal is to calculate the total number of unique customers within a two-week period. I can count the number of customers within a month or week period but not two weeks. Also, the two weeks are in a rolling order such that weeks 40 and 39 (as in the sample above) is one window period while weeks 39 and 38 is the next frame.
So far, this is how I am getting the monthly and weekly numbers. Assume that the customer names are distinct per day.
select order_date,
month,
week,
COUNT(DISTINCT customer) over (partition by month) month_active_outlets,
COUNT(DISTINCT customer) OVER (partition by week) week active outlets,
from table
Again, I am unable to calculate the unique customer names within a two-week period.
CodePudding user response:
I think the easiest would be to create your own grouper in a subquery and then use that to get to your count. Currently, COUNT UNIQUE and ORDER BY in the window is not supported, therefore that approach wouldn't work.
A possible query could be:
WITH
week_before AS (
SELECT
EXTRACT(WEEK from order_date) as week, --to be sure this is the same week format
month,
CONCAT(week,'-', EXTRACT(WEEK FROM DATE_SUB(order_date, INTERVAL 7 DAY))) AS two_weeks,
customer
FROM
`test`.`Basic`)
SELECT
two_weeks,
COUNT(DISTINCT customer) AS unique_customer
FROM
week_before
GROUP BY
two_weeks
CodePudding user response:
The window
function is the right tool. To obtain the 2 week date, we first extract the week number of the year:
mod(extract(week from order_date),2)
If the week number is odd (modulo 2) we add a week. Then we trunc to the start of (the even) week.
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week )
with tbl as
(Select date("2022-10-06") as order_date, "Paul" as customer
union all select date("2022-10-06"),"Edward"
union all select date("2022-10-01"),"Erick"
union all select date("2022-09-26"),"Divine"
union all select date("2022-09-23"),"Alice"
union all select date("2022-09-21"),"Evelyn"
)
select *,
date_trunc(order_date,month) as month,
date_trunc(order_date,week) as week,
COUNT(DISTINCT customer) OVER week2 as customer_2weeks,
string_agg(cast(order_date as string)) over week2 as list_2weeks,
from tbl
window week2 as (partition by date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week ))
The first days of a year are counted to the last week of the previous year:
select order_date,
extract(isoweek from order_date),
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week)
from
unnest(generate_date_array(date("2021-12-01"),date("2023-01-14"))) order_date
order by 1