Home > Net >  LIMIT / Filtering on LEFT JOIN
LIMIT / Filtering on LEFT JOIN

Time:09-28

I have two tables, one table is a list of purchases with revenue, purchase_time and a user id, the other table has a list of a list of campaign clicks with campaign_id, user_id, click_time. campaign_clicks essentially logs all clicks from a campaign, there can any number of clicks or none and these could happen anytime, before or after a purchase but what I need to do is determine which campaign_id was the last campaign to be clicked on before the purchase was made by any given user and what was the total revenue attributed to that campaign_id. I want to only attribute revenue to clicks that occurred within 3 days prior to the purchase.

purchases

date user_id revenue purchase_time
2020/09/01 10 30.0 2020/09/01 10:10:00 am
2020/09/01 20 15.0 2020/09/02 09:15:00 am
2020/09/01 30 25.0 2020/09/02 08:15:00 am

campaign_clicks

user_id campaign_id click_time
10 2 2020/09/01 10:01:00 am
10 1 2020/09/01 10:05:00 am
10 2 2020/09/01 10:20:00 am
20 2 2020/09/01 10:10:00 am
30 2 2020/09/01 07:30:00 am

desired result

date campaign_id revenue
2020/09/01 1 30.0
2020/09/01 2 25.0

purchase from user id 20 shouldn't be included because it occurred before the click_time. User 10 revenue should be attributed to campaign 2 because the click occurred just before the purchase.

My problem is the join I have is returning all the clicks which is inflating the revenue. The select in the inner join isn’t aware of the purchase time, I need to somehow filter and narrow down the clicks to a single click, the last click. I've tried using ROW_NUMBER() to apply an index but that doesn't allow me to filter out clicks that occur after the purchase.

This is where I’m at

SELECT  
  date
  ,ROUND(sum(revenue)) as revenue
  ,campaign_clicks.campaign_id
FROM 
    purchases                    
       
        LEFT JOIN ( 

                   SELECT                                   
                        campaign_id 
                        ,user_id
                        ,click_time                       
                   FROM 
                      campaign_clicks            
                   ORDER BY         
                      click_time DESC                             
                  ) AS clicks ON clicks.user_id = purchases.user_id 
WHERE 
  -- only select campaign clicks that occurred before the purchase                
  purchases.purchase_time > clicks.click_time

  -- only include clicks that occurred within 3 days of the purchase               
  AND DATEDIFF(minutes, clicks.click_time,purchases.purchase_time) <= (60*24*3)

  -- PROBLEM HERE - there can be still a number of other clicks that occurred before the purchase I need to filter to only the last one 
GROUP BY 
   date
 ,clicks.campaign_id

CodePudding user response:

Well you can achieve this using the following query. So basically, you can perform a INNER JOIN and filter out dates where the duration crosses 3 days within the ON clause itself.

Now coming to limiting to the last clicked campaign, it can be achieved using ROW_NUMBER function and setting the order of sequence to clicked_time DESC. This way the last clicked date before purchase will have a sequence no. of 1. You can then just filter out the records where row_number is greater than 1 by wrapping the result set in a outer query.

-- Outer query to select just the last click for a any given purchase
SELECT * FROM (
    SELECT p.date, p.purchase_time, c.click_time, c.campaign_id, p.revenue,
-- sequential row number for clicks sorted in descending order of date
    ROW_NUMBER() OVER(PARTITION BY c.user_id ORDER BY c.click_time DESC) AS row_num
    FROM purchases p
    INNER JOIN campaign_clicks c
    ON ( 
       c.user_id = p.user_id 
      --- only select clicks that occured before the purchase
      AND c.click_time<p.purchase_time
      --- only select the clicks that occurred 3 days prior (mins * hours * days )
      AND TIMESTAMPDIFF(MINUTE, c.click_time, p.purchase_time) <= (60*24*3)
    )
) res WHERE res.row_num=1
You can also check the results on the DB-Fiddle link

CodePudding user response:

Snowflake supports joining laterally. That is, on to a function or correlated sub-query. This allows you to join on to a query that returns just one row (per input row).

SELECT  
  purchases.date
 ,purchases.revenue
 ,clicks.campaign_id
FROM 
  purchases    
LEFT JOIN LATERAL
(
  SELECT
    campaign_id 
   ,user_id
   ,click_time                       
  FROM 
    campaign_clicks
  WHERE
            user_id = purchases.user_id
    -- only select campaign clicks that occurred before the purchase                
    AND click_time <  purchases.purchase_time
    -- only include clicks that occurred within 3 days of the purchase               
    AND click_time >= DATEADD(days, -3, purchases.purchase_time)
  ORDER BY
    click_time DESC
  LIMIT
    1                        
)
  AS clicks

(Please forgive typos, I'm on my phone.)

  • Related