Home > database >  How to recursively calculate the difference in departures from the previous month
How to recursively calculate the difference in departures from the previous month

Time:10-25

Let's say I have a table labeled employees and it looks like so

date employee id
4/1/2022 9
4/1/2022 8
3/1/2022 9
3/1/2022 8
3/1/2022 7
2/1/2022 9
2/1/2022 8
2/1/2022 7
1/1/2022 9
1/1/2022 8
1/1/2022 7
1/1/2022 6
1/1/2022 5

I'd like for my final output to look like below

date departures
4/1/2022 NULL
3/1/2022 1
2/1/2022 0
1/1/2022 2

Essentially, if the employee is not found on the next upcoming month, it's counting it as a departure for the current month. I already am calculating this with the following query:

WITH MONTHLY_COUNT(DTE, EMP_COUNT) AS(
SELECT
    [DATE],
    COUNT(1)    AS [COUNT]
FROM
    employees A
GROUP BY
    [DATE]),raw_data(EMPLOYEE, DATE) AS(
SELECT
    A.EMPLOYEE,
    [DATE]
FROM
    employees A),RANKING(DTE, EMP_COUNT, RANKING) AS(
SELECT
    A.DTE,
    A.EMP_COUNT,
    A.RANKING
FROM(

    SELECT
        TOP(6) ---pick the most recent x months
        A.DTE,
        A.EMP_COUNT,
        RANK() OVER(ORDER BY A.DTE DESC) AS [RANK]
    FROM 
        MONTHLY_COUNT AS A
)A(DTE, EMP_COUNT, RANKING))SELECT
    A.DTE,
    CASE
        WHEN A.DTE = B.DTE THEN B.SEPARATIONS
        WHEN A.DTE = C.DTE THEN C.SEPARATIONS
        WHEN A.DTE = D.DTE THEN D.SEPARATIONS
    END AS SEPARATIONS

FROM(
    SELECT
        TOP(6) --pick the most recent x months
        A.DTE
    FROM
        MONTHLY_COUNT A
    ORDER BY
        A.DTE DESC
)A

--COMPARE MONTH RANK 2 AND 1
LEFT JOIN(

    SELECT
        A.DTE,
        SUM(CASE WHEN B.EMPLOYEE IS NULL THEN 1 ELSE 0 END) AS SEPARATIONS
    FROM(

        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '2'  ----this is what I want to automate

    )A

    LEFT JOIN(
        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE  
        WHERE A.RANKING = '1' ----this is what I want to automate
    )B ON A.Employee = B.Employee

    GROUP BY
        A.DTE

)B ON B.DTE = A.DTE

--COMPARE MONTH RANK 3 AND 2
LEFT JOIN(

    SELECT
        A.DTE,
        SUM(CASE WHEN B.EMPLOYEE IS NULL THEN 1 ELSE 0 END) AS SEPARATIONS
    FROM(

        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '3'

    )A

    LEFT JOIN(
        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE
        WHERE A.RANKING = '2'
    )B ON A.Employee = B.Employee

    GROUP BY
        A.DTE

)C ON C.DTE = A.DTE
--COMPARE MONTH RANK 4 AND 3
LEFT JOIN(

    SELECT
        A.DTE,
        SUM(CASE WHEN B.EMPLOYEE IS NULL THEN 1 ELSE 0 END) AS SEPARATIONS
    FROM(

        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '4'

    )A

    LEFT JOIN(
        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '3'
    )B ON A.Employee = B.Employee

    GROUP BY
        A.DTE
)D ON D.DTE = A.DTE ORDER BY A.DTE DESC

The main problem is how can I automate this to where I'm not explicitly comparing months 2 and 1, and then comparing months 3 and 2, and then comparing months 4 and 3, and so on ?

CodePudding user response:

There is no need to really do this recursively - you can do it using sets of data. .

There are two approaches below - they check, for each employee and month, whether that employee has a later record. The first approach uses LEAD to get later records for an individual; the second approach explicitly checks the next month.

The two approaches will differ if a person 'comes back' several months after leaving - the first approach wouldn't treat it as a departure (as they will have a later record), while the second would (as they did not have a record for that specific month).

This db<>fiddle has the SQL below so you can examine it operating.

Initial table setup:

CREATE TABLE #EmpMonths (EmpDate date, EmployeeID int);
INSERT INTO #EmpMonths (EmpDate, EmployeeID) VALUES
('20220401',    9),
('20220401',    8),
('20220301',    9),
('20220301',    8),
('20220301',    7),
('20220201',    9),
('20220201',    8),
('20220201',    7),
('20220101',    9),
('20220101',    8),
('20220101',    7),
('20220101',    6),
('20220101',    5);

As there are no corresponding later records for the 'latest' record (1 April 2022), in this I will report only months prior to the most recent record For this I use MaxDate. Feel free to use any other approach.

DECLARE @MaxEmpDate date;
SET @MaxEmpDate = (SELECT MAX(EmpDate) FROM #EmpMonths);

Approach 1: Using LEAD

The following SQL gets a 2nd date associated with each employee/month record. This has the month they next appear in the dataset. (Note that I have included it here for your info only - it gets used in the answer in the next block of code).

SELECT      EmployeeID, EmpDate, LEAD(EmpDate, 1) OVER (PARTITION BY EmployeeID ORDER BY EmpDate) AS Later_Date
    FROM    #EmpMonths;

You can then use the above to get the departures by month - a NULL in the Later_Date column indicates they didn't have a later record - so you can count NULL records by month to get your answer.

WITH EmpRecords AS
        (SELECT     EmpDate, LEAD(EmpDate, 1) OVER (PARTITION BY EmployeeID ORDER BY EmpDate) AS Later_Date
            FROM    #EmpMonths
        )
    SELECT  EmpDate, COUNT(CASE WHEN Later_Date IS NULL THEN 1 ELSE NULL END) AS Num_Departures
    FROM    EmpRecords
    WHERE   EmpDate < @MaxEmpDate
    GROUP BY EmpDate;

Approach 2: Explicitly checking the next month

The alternative approach is to explicitly check the next month, to see if they have a record for that month. The following does this with a LEFT JOIN on the same table (joining on EmployeeID, and 1 month later).

SELECT      CurrentMonth.EmployeeID, CurrentMonth.EmpDate, NextMonth.EmpDate AS Later_Date
    FROM    #EmpMonths AS CurrentMonth
            LEFT OUTER JOIN #EmpMonths AS NextMonth ON CurrentMonth.EmployeeID = NextMonth.EmployeeID
                                                    AND DATEADD(month, 1, CurrentMonth.EmpDate) = NextMonth.EmpDate
    WHERE   CurrentMonth.EmpDate < @MaxEmpDate;

Similar to above, you can then count NULLs for each month.

WITH EmpRecords AS
        (SELECT     CurrentMonth.EmployeeID, CurrentMonth.EmpDate, NextMonth.EmpDate AS Later_Date
            FROM    #EmpMonths AS CurrentMonth
                    LEFT OUTER JOIN #EmpMonths AS NextMonth ON CurrentMonth.EmployeeID = NextMonth.EmployeeID
                                                            AND DATEADD(month, 1, CurrentMonth.EmpDate) = NextMonth.EmpDate
            WHERE   CurrentMonth.EmpDate < @MaxEmpDate
        )
    SELECT  EmpDate, COUNT(CASE WHEN Later_Date IS NULL THEN 1 ELSE NULL END) AS Num_Departures
    FROM    EmpRecords
    GROUP BY EmpDate;

Results for both are below.

EmpDate      Num_Departures
2022-01-01   2
2022-02-01   0
2022-03-01   1

CodePudding user response:

Try the following :

select [date], SUM(ch) departures
from
(
  select [date], 
       case 
        when [date] = (select max([date]) from table_name)
          then null
        when not exists(select 1 from table_name D where D.[date]=DATEADD(month, 1, T.[date]) and D.employee_id=T.employee_id)
          then 1 
        else 0
      end as ch
  from table_name T
) A
group by [date]
order by [date] desc

See a demo.

when [date] = (select max([date]) from table_name) to select last month values as null since there is no next month.

when not exists(select 1 ... for each row in the table, checks if an employee is not existed in the next month of the current row month, if the condition is met then return 1 else return 0 (the employee is existed in current row month and next month).

  • Related