I have a table of events which records actions the customer takes in our website. I want to find out how many times a customer visited before he finally purchases an item.
The above table will be aggregated as
In the first week customerid 1 made 3 visits (including the visit in which he made a purchase). Again he made a purchase in the same week in another visit. So you can see 3 in the first case and 1 in second case. That is every time the user makes a purchase the count should be reset.
The solution i came up with is very messy, slow (involved multiple joins and 3 windows function) and it is not working in some cases. I am missing some data.
It would be great if someone can help me in the right direction on how to approach this scenario.
Thanks in advance.
CodePudding user response:
Try this:
WITH
-- your input, don't use in final query
visits(wk,visit_id,cust_id,has_purchased) AS (
SELECT 1,1,1,FALSE
UNION ALL SELECT 1,2,1,FALSE
UNION ALL SELECT 1,3,1,TRUE
UNION ALL SELECT 1,2,1,TRUE
)
-- real query starts here, replace following comma with "WITH"
,
with_counter AS (
SELECT
*
, LAG(CASE WHEN has_purchased THEN 1 ELSE 0 END,1,0)
OVER(PARTITION BY wk,cust_id ORDER BY visit_id) AS grp_end
FROM visits
)
SELECT
wk
, cust_id
, grp_end
, COUNT(*) AS visits_before_purchase
FROM with_counter
GROUP BY
wk
, cust_id
, grp_end
;
-- out wk | cust_id | grp_end | visits_before_purchase
-- out ---- --------- --------- ------------------------
-- out 1 | 1 | 0 | 3
-- out 1 | 1 | 1 | 1
CodePudding user response:
I'm assuming that each the each time a customer visit, their visit id will increase by 1 each time. So I just took the difference between the visit id for each customer to find out how many visit they made before purchasing something.
SELECT weeks, visit_id, customer_id, purchase_flag,
CASE WHEN diff IS null then visit_id else diff
end
FROM (
SELECT *, visit_id - LAG(visit_id) over (partition by customer_id order by
customer_id, visit_id)as diff
FROM customer
WHERE purchase_flag = 1
) as t ORDER BY customer_id, visit_id