Get first date OR first date after gap-CodePudding

I have a large table of customers and their policies that are set up like this (the actual dates can be throughout the year):

Record	CustomerID	Name	PolicyStart	PolicyEnd	Agent	Status
1	12345	Jane Doe	20060101	20081231	John Smith	Term
2	12345	Jane Doe	20090101	20101231	Sarah Po	Term
3	12345	Jane Doe	20110101		Sarah Po	Active
4	12346	Joe Dirt	20130101	20141231	Blake Lee	Term
5	12346	Joe Dirt	20200101		Blake Lee	Active

I've been using a basic SQL query to get all the active accounts, by selecting the records where the status is "Active" and spitting out the list of active policies, like so:

SELECT * FROM Customers WHERE Status = 'Active'

Record	CustomerID	Name	PolicyStart	PolicyEnd	Agent	Status
3	12345	Jane Doe	20110101		Sarah Po	Active
5	12346	Joe Dirt	20200101		Blake Lee	Active

However, I've been asked to now provide a result like this:

Record	CustomerID	Name	OriginalEffectiveDate	Agent	Status
3	12345	Jane Doe	20060101	Sarah Po	Active
5	12346	Joe Dirt	20200101	Blake Lee	Active

The specific ask is to calculate the OriginalEffectiveDate column based on the policy start and end dates using this criteria: Use the earliest policy start date as the customer's OriginalEffectiveDate, as long as there has been no break in coverage of one month or greater. If there has been a gap (or gaps), use the first date after the latest gap instead.

So, Jane's original effective date would be 20060101, as that's when she first got coverage, and she has not had any breaks in coverage. However, Joe's original effective date would be 20200101, as he had a break (one month or greater) after his initial policy, and that is the first date after the gap.

I've attempted to mess around with CASE, MIN, MAX, and TOP statements, to no avail. I have no idea how to proceed. I believe in "teaching someone to fish", so please feel free to point me in the direction of a tutorial. The sticking point is the "one month or greater" and "after the latest gap" parts. I tried to search for a solution online, and I haven't come across anything similar, but perhaps I don't know the right way to phrase the question.

CodePudding user response：

One possible solution you can try is as follows using lag.

First get the gap in months between the current PolicyEnd and the next PolicyStart, it's possible to then find the most recent record for each customer where the gap was greater than 1 month.

Then you can select the minimum valid record for each customer after this point.

with gaps as (
  select *,
    case when DateDiff(month, PolicyEnd,Lead(PolicyStart) over(partition by CustomerID order by Record)) > 1 then 1 else 0 end b
  from t
), minrecord as (
  select CustomerID, Max(Record) mr
  from gaps
  where b = 1
  group by CustomerID
)
select Record, CustomerID, [Name], Agent, [Status], ed.OriginialEffectiveDate
from t
cross apply (
  select Min(PolicyStart) OriginialEffectiveDate
  from t t2
  where t2.CustomerID = t.CustomerID and
    t2.Record > IsNull((select mr from minrecord where CustomerID = t2.CustomerID), 0)
)ed
where t.[Status] = 'active';

See this demo Fiddle

CodePudding user response：

Here's what I ended up going with:

SELECT Record
    ,Customers.CustomerID
    ,Name
    ,CustomerEffectiveDate
    ,Agent
    ,Status
FROM Customers
JOIN (
    SELECT CustomerID
        ,MIN(PolicyStart) AS CustomerEffectiveDate
    FROM (
        SELECT CustomerID
            ,PolicyStart
            ,DateDiff(month, PolicyEnd, Lead(PolicyStart) OVER (
                    PARTITION BY CustomerID ORDER BY CustomerID
                        ,PolicyStart
                    )) AS gapdays
        FROM Customers
        ) gaps
    WHERE gapdays < 2
        OR gapdays IS NULL
    GROUP BY CustomerID
    ) minstart ON customers.CustomerID = minstart.CustomerID
WHERE status = 'Active'

It's probably not the most efficient code, but it seems to work.