I am working on some ecommerce data where a user's visits and their orders are kept in separate tables. I want find out how many visits a user makes to the site before they make their first purchase.
Right now I can get the order dates for a user and the count of visits before that date, but when I try to narrow it down to just the first visit I keep getting errors.
Is this possible?
My current code for one user is below. It shows a count of days visited before the order date:
SELECT ae.USER_KEY,
ae.EMPLOYEE_NUMBER,
min(ae.ORDER_ENTRY_DATE) as min_date,
COUNT (DISTINCT DATE(ve.VISIT_DATE_TIMESTAMP)) AS VISITS
FROM ORDER_ACTIVITY ae , VISITS ve
WHERE DATE(VISIT_DATE_TIMESTAMP) <= ORDER_ENTRY_DATE
AND ae.USER_KEY = '35316826'
AND ve.USER_KEY = ae.USER_KEY
AND ae.ORDER_ENTRY_DATE BETWEEN '01/01/2022' AND '12/31/2022'
AND DATE(ve.VISIT_DATE_TIMESTAMP) BETWEEN '01/01/2022' AND '12/31/2022'
GROUP BY ae.USER_KEY,
ae.ORDER_KEY,
ae.EMPLOYEE_NUMBER
and it returns
USER_KEY | EMPLOYEE_NUMBER | MIN_DATE | VISITS |
---|---|---|---|
35,316,826 | 650125 | 2022-07-29 | 3 |
35,316,826 | 650125 | 2022-08-11 | 7 |
35,316,826 | 650125 | 2022-08-17 | 8 |
35,316,826 | 650125 | 2022-08-26 | 9 |
35,316,826 | 650125 | 2022-11-14 | 14 |
35,316,826 | 650125 | 2022-11-16 | 15 |
35,316,826 | 650125 | 2022-11-16 | 15 |
35,316,826 | 650125 | 2022-12-23 | 20 |
What I want it to return is just
USER_KEY | EMPLOYEE_NUMBER | MIN_DATE | VISITS |
---|---|---|---|
35,316,826 | 650125 | 2022-07-29 | 3 |
Is this possible?
CodePudding user response:
I think the main issue is the GROUP BY
clause is too specific (should not include ORDER_KEY
), but there are other concerns as well.
One is to stop using the A,B
join syntax that has been obsolete for more than 30 years now. That makes it difficult for us to know which of the WHERE
conditions in this query belong with the JOIN, which absolutely makes a difference here.
Another issue is there's not a meaningful way in just this data to match up a visit to an order. Thankfully we only care about the first order, so we can work around that. If you wanted to show how many visits led up to each order since the prior order, we call that a "gaps and islands" problem, which can be even more complicated.
SELECT o.User_Key, o.Employee_Number, o.Order_Entry_Date, count(DISTINCT v.Visit_Date_Timestamp) As Visits
FROM (
SELECT oa.USER_KEY,
oa.EMPLOYEE_NUMBER,
oa.Order_Entry_date,
row_number() over (PARTITION BY oa.User_Key ORDER BY oa.Order_Entry_Date) rn
FROM Order_Activity oa
) o
INNER JOIN Visits v ON v.USER_KEY = o.USER_KEY
AND DATE(v.Visit_Date_Timestamp) <= o.Order_Entry_Date
WHERE o.ORDER_ENTRY_DATE >= '20220101' AND o.Order_Entry_Date < '20230101'
AND o.rn = 1
GROUP BY o.User_Key, o.Employee_Number
CodePudding user response:
If you are using MySQL, can add Order By and Limit command.
ORDER BY min_date ASC (or ORDER BY visits ASC)
Limit 0, 1
CodePudding user response:
Yes you can. You do it in 2 steps. First make a query which gives you the min date which is the basis of the comparison for the second query. You need to design query 1 so that it just outputs the min date. Then you can test this query and check if the results are satisfactory.
The next step is to put the whole query into parenthesis and use it in the where clause of the second query:
select count(ve.VISIT_DATE_TIMESTAMP)
from VISITS ve
where ve.VISIT_DATE_TIMESTAMP < (query1)
group by ve.VISIT_DATE_TIMESTAMP;
CodePudding user response:
You can use the USER_KEY column to join the tables, it is present in both tables:
select ae.USER_KEY
, ORDER_ENTRY_DATE as min_date,
max (Visits)
from (
select USER_KEY
, ORDER_ENTRY_DATE
, firstOrder
from (
select ROW_NUMBER() over (partition by USER_KEY order by USER_KEY, ORDER_ENTRY_DATE ) firstOrder
, USER_KEY
, ORDER_ENTRY_DATE
from #ORDER_ACTIVITY
)t where t.firstOrder = 1
) ae
left join (
select ROW_NUMBER() over (partition by USER_KEY order by USER_KEY ) Visits
, USER_KEY
, VISIT_DATE_TIMESTAMP
from #VISITS
) ve
on ve.USER_KEY = ae.USER_KEY
where ae.USER_KEY = '35316826'
group by ae.USER_KEY
, ORDER_ENTRY_DATE