I store events such as view, cart, purchase etc. for my products. Now I would like to take only products which were viewed before purchase.
I can't come up with an idea how to do it in omptimal way.
Here is my table structure below
id | user_id | product_id | event_type | event_time |
---|---|---|---|---|
a113a813-10e1-4e4f-88a4-4eaf1be1b5af | 9bf46f1a-51df-44c0-8875-fe6d8d35d503 | 643 | view | 2020-12-21 12:19:13 |
581746f3-ac5c-4331-87fe-1a33ea526b5f | 9bf46f1a-51df-44c0-8875-fe6d8d35d503 | 643 | purchase | 2020-12-21 12:19:32 |
c4cf64b5-fdfd-4604-b1d8-3296d6da5a3a | f8cb84e7-b942-43e6-a7ba-48e8f8d2331e | 730 | view | 2020-12-21 13:14:06 |
7d22b26d-b3d0-4f1e-b615-2816dbd96785 | f8cb84e7-b942-43e6-a7ba-48e8f8d2331e | 730 | purchase | 2020-12-21 13:14:42 |
cb14e71c-982a-43ca-8f01-82223b3de4e5 | 04cfed3c-0356-4cc2-91b9-0a6833fdd44d | 728 | purchase | 2020-12-21 13:44:29 |
Edit: I use PostgreSQL 9
CodePudding user response:
You can use a subquery to get the event_type
value before the last one.
Here is what the query might look like
SELECT
id,
user_id,
product_id,
event_type,
event_time
FROM (
SELECT
id, user_id, product_id, event_type, event_time,
LAG(event_type) OVER (PARTITION BY user_id, product_id ORDER BY event_time ASC) AS prev_event
FROM data
) d WHERE event_type = 'purchase' AND prev_event = 'view'
As per @siwymilek comment to support any number of different events between view and purchase event_type
value it is not even needed to analyze it's previous value, technically it's needed to check if a row with event_type
value view exists for particular user_id
and product_id
with event_time
value less than with purchase.
And a query would be like this
SELECT DISTINCT ON (user_id, product_id)
LAST_VALUE(id) OVER (PARTITION BY user_id, product_id) AS last_id,
user_id,
product_id,
LAST_VALUE(event_time) OVER (PARTITION BY user_id, product_id) AS last_event_time
FROM data d
WHERE event_type = 'purchase' AND EXISTS (
SELECT 1 FROM data
WHERE user_id = d.user_id
AND product_id = d.product_id
AND event_type = 'view'
AND event_time < d.event_time
)
CodePudding user response:
I propose to make 2 subqueries to get the first interesting events and then join the product_id for the result:
with
views as (select min(event_time) first_view, product_id from test where event_type = 'view' group by product_id),
purchases as (select min(event_time) first_purchase, product_id from test where event_type = 'purchase' group by product_id)
select * from views join purchases on views.product_id = purchases.product_id;
See this db-fiddle. (Also works well if there are additional event_types in the table, see @
siwymilek's comment)