Home > other >  Can I count the number of dates in a column that occur before the minimum date in another column in
Can I count the number of dates in a column that occur before the minimum date in another column in

Time:01-26

I am working on some ecommerce data where a user's visits and their orders are kept in separate tables. I want find out how many visits a user makes to the site before they make their first purchase.

Right now I can get the order dates for a user and the count of visits before that date, but when I try to narrow it down to just the first visit I keep getting errors.

Is this possible?

My current code for one user is below. It shows a count of days visited before the order date:

SELECT ae.USER_KEY,
       ae.EMPLOYEE_NUMBER,
       min(ae.ORDER_ENTRY_DATE) as min_date,
       COUNT (DISTINCT DATE(ve.VISIT_DATE_TIMESTAMP))  AS VISITS
FROM ORDER_ACTIVITY ae , VISITS ve
WHERE DATE(VISIT_DATE_TIMESTAMP) <= ORDER_ENTRY_DATE
    AND ae.USER_KEY = '35316826'
    AND ve.USER_KEY = ae.USER_KEY
    AND ae.ORDER_ENTRY_DATE BETWEEN '01/01/2022' AND '12/31/2022'
    AND DATE(ve.VISIT_DATE_TIMESTAMP) BETWEEN '01/01/2022' AND '12/31/2022'
GROUP BY  ae.USER_KEY,
       ae.ORDER_KEY,
       ae.EMPLOYEE_NUMBER

and it returns

USER_KEY EMPLOYEE_NUMBER MIN_DATE VISITS
35,316,826 650125 2022-07-29 3
35,316,826 650125 2022-08-11 7
35,316,826 650125 2022-08-17 8
35,316,826 650125 2022-08-26 9
35,316,826 650125 2022-11-14 14
35,316,826 650125 2022-11-16 15
35,316,826 650125 2022-11-16 15
35,316,826 650125 2022-12-23 20

What I want it to return is just

USER_KEY EMPLOYEE_NUMBER MIN_DATE VISITS
35,316,826 650125 2022-07-29 3

Is this possible?

CodePudding user response:

I think the main issue is the GROUP BY clause is too specific (should not include ORDER_KEY), but there are other concerns as well.

One is to stop using the A,B join syntax that has been obsolete for more than 30 years now. That makes it difficult for us to know which of the WHERE conditions in this query belong with the JOIN, which absolutely makes a difference here.

Another issue is there's not a meaningful way in just this data to match up a visit to an order. Thankfully we only care about the first order, so we can work around that. If you wanted to show how many visits led up to each order since the prior order, we call that a "gaps and islands" problem, which can be even more complicated.

SELECT o.User_Key, o.Employee_Number, o.Order_Entry_Date, count(DISTINCT v.Visit_Date_Timestamp) As Visits
FROM (
    SELECT oa.USER_KEY,
        oa.EMPLOYEE_NUMBER,
        oa.Order_Entry_date,
        row_number() over (PARTITION BY oa.User_Key ORDER BY oa.Order_Entry_Date) rn
    FROM Order_Activity oa
) o
INNER JOIN Visits v ON v.USER_KEY = o.USER_KEY
        AND DATE(v.Visit_Date_Timestamp) <= o.Order_Entry_Date
WHERE o.ORDER_ENTRY_DATE >= '20220101' AND o.Order_Entry_Date < '20230101'
    AND o.rn = 1
GROUP BY o.User_Key, o.Employee_Number

CodePudding user response:

If you are using MySQL, can add Order By and Limit command.

ORDER BY min_date ASC (or ORDER BY visits ASC)
Limit 0, 1

CodePudding user response:

Yes you can. You do it in 2 steps. First make a query which gives you the min date which is the basis of the comparison for the second query. You need to design query 1 so that it just outputs the min date. Then you can test this query and check if the results are satisfactory.

The next step is to put the whole query into parenthesis and use it in the where clause of the second query:

select count(ve.VISIT_DATE_TIMESTAMP) 
from VISITS ve 
where ve.VISIT_DATE_TIMESTAMP < (query1) 
group by ve.VISIT_DATE_TIMESTAMP;

CodePudding user response:

You can use the USER_KEY column to join the tables, it is present in both tables:

select   ae.USER_KEY
       , ORDER_ENTRY_DATE as min_date,
       max (Visits)
from (

    select   USER_KEY
           , ORDER_ENTRY_DATE
           , firstOrder
        from (
            select  ROW_NUMBER() over (partition by USER_KEY order by USER_KEY, ORDER_ENTRY_DATE ) firstOrder  
                    , USER_KEY
                    , ORDER_ENTRY_DATE  
            from #ORDER_ACTIVITY
       )t where t.firstOrder = 1
 ) ae 
 left join (   
    
        select  ROW_NUMBER() over (partition by USER_KEY order by USER_KEY ) Visits
              , USER_KEY
              , VISIT_DATE_TIMESTAMP 
        from #VISITS    
 ) ve
on  ve.USER_KEY = ae.USER_KEY
where ae.USER_KEY = '35316826'
group by   ae.USER_KEY
        , ORDER_ENTRY_DATE
  • Related