Home > Back-end >  How to create a SQL window function to flag a first transaction for a 30 day window?
How to create a SQL window function to flag a first transaction for a 30 day window?

Time:08-26

I am trying to build a flag in a SQL query which will identify the record which starts a new 30 day window. The idea is when a customer has a transaction to be able to track how many more transaction were within the next 30 days.

When I first created a query I used the lag function to see if there was a transaction within the 30 days prior of it but that is when I ran in to the issue were there may be a transaction within the past 30 days but that transaction was already part of a 'previous' 30 day window.

In the picture below I would want to identify the rows with arrows (lines 1, 4, 5) as they are transactions which started a new 30 day window.

screenshot identifying records to flag

My current query does not correctly identify line 4 as a new start as there is a transaction within the past 31 days.

CodePudding user response:

To get the "flag" value (let's call it window_id), you can use a recursive CTE that calculates the date difference from the "window" start date and assigns the "windows" start dates and window_id values to the rows.

You can generate "window" ids using a query like this

WITH rn AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS rn
    FROM transactions
),
window AS (
    SELECT 
        *,
        transaction_date AS window_start,
        1 AS window_id
    FROM rn
    WHERE rn = 1
    UNION ALL
    SELECT
        rn.*,
        CASE WHEN DATEDIFF(day, w.window_start, rn.transaction_date) > 30
             THEN rn.transaction_date
             ELSE w.window_start
        END,
        CASE WHEN DATEDIFF(day, w.window_start, rn.transaction_date) > 30
             THEN w.window_id   1
             ELSE w.window_id
        END
    FROM rn 
    JOIN window w ON rn.customer_id = w.customer_id AND rn.rn = w.rn   1
)
SELECT 
    customer_id,
    transaction_id,
    transaction_date,
    window_id
FROM window  

Here, the first CTE is used to generate sequential row numbers for customer transactions, since in real life transaction ids are not sequental and can be messed with other customer transactions. This row numbers are then used in the second recursive CTE.

Result

customer_id transaction_id transaction_date window_id
9 4 2021-03-17 1
9 5 2021-04-01 1
9 6 2021-04-12 1
9 7 2021-05-10 2
9 8 2021-06-15 3

db<>fiddle here

  • Related