Home > Back-end >  Converting PostgreSQL recursive CTE to SQL Server
Converting PostgreSQL recursive CTE to SQL Server

Time:10-15

I'm having trouble adapting some recursive CTE code from PostgreSQL to SQL Server, from the book "Fighting Churn with Data"

This is the working PostgreSQL code:

with recursive
    active_period_params as (
        select interval '30 days'  as allowed_gap,
        '2021-09-30'::date as calc_date
    ),
    active as (
        -- anchor
        select distinct account_id, min(start_date) as start_date    
        from subscription inner join active_period_params 
            on start_date <= calc_date    
            and (end_date > calc_date or end_date is null)
        group by account_id
    
        UNION
        
        -- recursive
        select s.account_id, s.start_date  
        from subscription s 
        cross join active_period_params 
        inner join active e on s.account_id=e.account_id  
            and s.start_date < e.start_date  
            and s.end_date >= (e.start_date-allowed_gap)::date  
    )
select account_id, min(start_date) as start_date
from active
group by account_id

This is my attempt at converting to SQL Server. It gets stuck in a loop. I believe the issue has to do with the UNION ALL required by SQL Server.

with
    active_period_params as (
        select 30 as allowed_gap,
        cast('2021-09-30' as date) as calc_date
    ),
    active as (
        -- anchor
        select distinct account_id, min(start_date) as start_date    
        from subscription inner join active_period_params 
            on start_date <= calc_date    
            and (end_date > calc_date or end_date is null)
        group by account_id
    
        UNION ALL
        
        -- recursive
        select s.account_id, s.start_date  
        from subscription s 
        cross join active_period_params 
        inner join active e on s.account_id=e.account_id  
            and s.start_date < e.start_date  
            and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
    )
select account_id, min(start_date) as start_date
from active
group by account_id

The subscription table is a list of subscriptions belonging to customers. A customer can have multiple subscriptions with overlapping dates or gaps between dates. null end_date means the subscription is currently active and has no defined end_date. Example data for a single customer (account_id = 15) below:

subscription
 ---------------------------------------------------
|  id  |  account_id  |  start_date  |   end_date   |
 ---------------------------------------------------
|   6  |      15      |  01/06/2021  |    null    |
|   5  |      15      |  01/01/2021  |    null    |
|   4  |      15      |  01/06/2020  | 01/02/2021 |
|   3  |      15      |  01/04/2020  | 15/05/2020 |
|   2  |      15      |  01/03/2020  | 15/05/2020 |
|   1  |      15      |  01/06/2019  | 01/01/2020 |

Expected query result (as produced by PostgreSQL code):

 ------------------------------
|  account_id  |  start_date  |
 ------------------------------
|      15      |  01/03/2020  |

Issue: The SQL Server code above gets stuck in a loop and doesn't produce a result.

Description of the PostgreSQL code:

  1. anchor block finds subs that are active as at the calc_date (30/09/2021) (id 5 & 6), and returns the min start_date (01/01/2021)
  2. the recursion block then looks for any earlier subs that existed within the allowed_gap, which is 30 days prior to the min_start date found in 1). id 4 meets this criteria, so the new min start_date is 01/06/2020
  3. recursion repeats and finds two subs within the allowed_gap (01/06/2020 - 30 days). Of these subs (id 2 & 3), the new min start_date is 01/03/2020
  4. recursion fails to find an earlier sub within the allowed_gap (01/03/2020 - 30 days)
  5. query returns a start date of 01/03/2020 for account_id 15

Any help appreciated!

CodePudding user response:

It seems the issue is related to the way SQL Server deals with recursive CTEs.

This is a type of gaps-and-islands problem, and does not actually require recursion.

There are a number of solutions, here is one. Given your requirement, there may be more efficient methods, but this should get you started.

  • Using LAG we identify rows which are within the specified gap of the next row
  • We use a running COUNT to give each consecutive set of rows an ID
  • We group by that ID, and take the minimum start_date, filtering out non-qualifying groups
  • Group again to get the minimum per account
DECLARE @allowed_gap int = 30,
        @calc_date datetime = cast('2021-09-30' as date);

WITH PrevValues AS (
    SELECT *,
      IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
                     ORDER BY start_date), '2099-01-01') < DATEADD(day, -@allowed_gap, start_date)
                     THEN 1 END
    FROM subscription
),
Groups AS (
    SELECT *,
      GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
                     ORDER BY start_date ROWS UNBOUNDED PRECEDING)
    FROM PrevValues
),
ByGroup AS (
    SELECT
      account_id,
      GroupId,
      start_date = MIN(start_date)
    FROM Groups
    GROUP BY account_id, GroupId
    HAVING COUNT(CASE WHEN start_date <= @calc_date    
            and (end_date > @calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
  account_id,
  start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;

db<>fiddle

  • Related