I'm trying to calculate monthly averages of - customer recency (days since last order) and customer latency (days between last order and previous order).
Calculating recency on the last day of each reporting month.
Including customers in the monthly calculation where:
- max(order_date) is BETWEEN reporting mth_end_date (i.e. not in future) and 12 months prior.
- latency for the max(order_date) IS NOT NULL (i.e. it has a prior order)
It feels like this is reasonably simple (when I says it at least!), but I can't crack it, and feel I am missing something obvious.
I'd appreciate any support you can offer.
Below is my progress to date using sample data.
With input_data AS (
SELECT '#1238' as order_id, DATE('2021-12-15') as order_date, 'c12345' as cust_id, 18 as order_value
UNION ALL SELECT '#1201' as order_id, DATE('2021-10-10') as order_date, 'c12345' as cust_id, 18 as order_value
UNION ALL SELECT '#1198' as order_id, DATE('2021-07-05') as order_date, 'c12345' as cust_id, 20 as order_value
UNION ALL SELECT '#1134' as order_id, DATE('2020-10-15') as order_date, 'c12345' as cust_id, 10 as order_value
UNION ALL SELECT '#1112' as order_id, DATE('2019-08-10') as order_date, 'c12345' as cust_id, 5 as order_value
UNION ALL SELECT '#1234' as order_id, DATE('2021-07-05') as order_date, 'c11111' as cust_id, 118 as order_value
UNION ALL SELECT '#1294' as order_id, DATE('2021-01-05') as order_date, 'c11111' as cust_id, 68 as order_value
UNION ALL SELECT '#1290' as order_id, DATE('2021-01-01') as order_date, 'c11111' as cust_id, 82 as order_value
UNION ALL SELECT '#1284' as order_id, DATE('2020-01-15') as order_date, 'c22222' as cust_id, 98 as order_value),
enhanced_data AS (
SELECT
*
,DATE_DIFF(order_date, LAG(order_date) OVER (PARTITION BY cust_id ORDER BY order_date), day) AS order_latency_days,
from input_data
),
dates AS (
SELECT
LAST_DAY(array_date, month) AS mth_end_date
FROM
UNNEST(GENERATE_DATE_ARRAY(
(SELECT MIN(order_date) FROM enhanced_data),
(SELECT MAX(order_date) FROM enhanced_data), INTERVAL 1 MONTH)) as array_date)
Select
d.mth_end_date
, avg(e.order_latency_days) as avg_latency
from dates d
left join enhanced_data e
ON Date_trunc(d.mth_end_date, Month) > Date_trunc(e.order_date, Month)
WHERE e.order_latency_days IS NOT NULL
group by 1
order by 1
I would expect the values for Dec-2021 to be (showing workings):
CodePudding user response:
Here's some pseudo-code that I think will help:
with allorders as (
select *, lag(orderdate) over (order by orderdate) as lastorderdate
from T
-- this is probably more efficient
where orderdate between <one year ago> and <end of current month>
and exists (
select 1 from T t2
-- this query isn't restricted by the one-year range
where t2.customerid = T.customerid and t2.orderdate < T.orderdate
)
)
select customerid, avg(<difference in lastorderdate and orderdate>)
from allorders
group by customerid
This ought to work but it might be slow:
with recentorders as (
select *, lag(orderdate) over (order by orderdate) as lastorderdate
from T
)
select customerid, avg(<difference in lastorderdate and orderdate>)
from recentorders
where orderdate between <one year ago> and <end of current month>
and lastorderdate is not null
group by customerid
CodePudding user response:
It's probably not optimal but I have a working solution in case it is a helpful starting point for others. Thanks @shawnt00 for your input!
With input_data AS (
SELECT '#1238' as order_id, DATE('2021-12-15') as order_date, 'c12345' as cust_id, 18 as order_value
UNION ALL SELECT '#1201' as order_id, DATE('2021-10-10') as order_date, 'c12345' as cust_id, 18 as order_value
UNION ALL SELECT '#1198' as order_id, DATE('2021-07-05') as order_date, 'c12345' as cust_id, 20 as order_value
UNION ALL SELECT '#1134' as order_id, DATE('2020-10-15') as order_date, 'c12345' as cust_id, 10 as order_value
UNION ALL SELECT '#1112' as order_id, DATE('2019-08-10') as order_date, 'c12345' as cust_id, 5 as order_value
UNION ALL SELECT '#1234' as order_id, DATE('2021-07-05') as order_date, 'c11111' as cust_id, 118 as order_value
UNION ALL SELECT '#1294' as order_id, DATE('2021-01-05') as order_date, 'c11111' as cust_id, 68 as order_value
UNION ALL SELECT '#1290' as order_id, DATE('2021-01-01') as order_date, 'c11111' as cust_id, 82 as order_value
UNION ALL SELECT '#1284' as order_id, DATE('2020-01-15') as order_date, 'c22222' as cust_id, 98 as order_value),
-- for each order record add the date of that customers prior order
-- exclude all orders in the future (relative to the report date)
enhanced_orders AS (
SELECT
*
,lag(order_date) OVER (PARTITION by cust_id ORDER BY order_date) AS cust_prev_order_date
FROM input_data
WHERE order_date < Date('2021-11-30')),
-- for each customer determine the most recent, and the next most recent, order dates
cust_calc as (
SELECT
cust_id
, max(order_date) as max_order_date
, max(cust_prev_order_date) as max_prev_order_date
FROM enhanced_orders
GROUP by 1),
-- for each customer calculate how many days ago their most recent order was from the report date,
-- and calculate the days between their last two most recent orders
-- exclude customers if their most recent order was >12m ago, or they don't have an order prior to their most recent one
cust_calc2 AS (
SELECT
cust_id
, date_diff(Date('2021-11-30'), max_order_date, day) as recency
, date_diff(max_order_date, max_prev_order_date, day) as latency
FROM cust_calc
WHERE max_prev_order_date IS NOT NULL
AND max_order_date > date_sub(Date('2021-11-30'), INTERVAL 1 YEAR))
-- average the recency and latency of all customers at the report date
SELECT
avg(recency) AS monthly_recency
,avg(latency) AS monthly_latency
FROM cust_calc2