Home > Software engineering >  MIN & MAX date in group where group repeats
MIN & MAX date in group where group repeats

Time:06-25

I've killed my spreadsheet - who knew querying 180k rows would be too much. :)
I've got my database replacement planned out, but have come up against a couple of stumbling blocks while normalising and cleaning the spreadsheet data.

I have a table of data similar to below called RD1.

WorkDate EmpNo Contract
13-Mar-21 123 16
14-Mar-21 123 16
15-Mar-21 123 16
16-Mar-21 123 40
17-Mar-21 123 40
18-Mar-21 123 16
19-Mar-21 123 16

I am trying to group the table so I have the start and end date for each change in Contract.
My final table would look like:

EmpNo Contract StartDate EndDate
123 16 13-Mar-21 15-Mar-21
123 40 16-Mar-21 17-Mar-21
123 16 18-Mar-21 19-Mar-21

As expected, the SQL below returns a two row table.

SELECT          EmpNo, Contract, 
                MIN(WorkDate) AS StartDate, 
                MAX(WorkDate) AS EndDate
FROM            RD1
GROUP BY        EmpNo, Contract  
EmpNo Contract StartDate EndDate
123 16 13/03/2021 19/03/2021
123 40 16/03/2021 17/03/2021

My mind is blank on how to solve this - been over ten years since I was writing SQL regularly.
Any help would be greatly appreciated.

Edit
A further thought I had was that the WorkDate is sequential with no dates missing so I can join the table to itself a couple of times to identify the start and end of each group. For now this is just called Query1.

SELECT  RD1.WorkDate,
        RD1.EmpNo,
        RD1.Contract,
        IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND 
            (RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start',
            IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
                 (RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End")) AS Identifier
FROM    (RD1 LEFT JOIN RD1 RD2 ON RD1.EmpNo = RD2.EmpNo AND
                                  RD1.WorkDate = RD2.WorkDate 1)
             LEFT JOIN RD1 RD3 ON RD1.EmpNo = RD3.EmpNo AND
                                  RD1.WorkDate = RD3.WorkDate-1                           
WHERE   NOT ISNULL(IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND 
                       (RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start',
                       IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
                           (RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End")))

This gives the following table:

WorkDate EmpNo Contract Identifier
13-Mar-21 123 16 Start
15-Mar-21 123 16 End
16-Mar-21 123 40 Start
17-Mar-21 123 40 End
18-Mar-21 123 16 Start
19-Mar-21 123 16 End

I can then use Query1 to get Start/End date pairs, but this joins each start date with each end date.

SELECT  T1.EmpNo, 
        T1.Contract, 
        T1.Start, 
        T2.End
FROM    (SELECT EmpNo, Contract, WorkDate AS Start 
         FROM Query1 WHERE Identifier = 'Start')  AS T1 LEFT JOIN 
        (SELECT EmpNo, Contract, WorkDate AS End FROM Query1 WHERE Identifier = 'End')  AS T2 
            ON (T1.Start<=T2.End) AND (T1.EmpNo = T2.EmpNo) 
EmpNo Contract Start End
123 16 13-Mar-21 19-Mar-21
123 16 13-Mar-21 17-Mar-21
123 16 13-Mar-21 15-Mar-21
123 40 16-Mar-21 19-Mar-21
123 40 16-Mar-21 17-Mar-21
123 16 18-Mar-21 19-Mar-21

I guess I need it to get the minimum end date that is greater than the start date:

SELECT  EmpNo, 
        Contract, 
        Start, 
        Min(End) AS MinOfEnd
FROM    Query6
GROUP BY EmpNo, Contract, Start
ORDER BY Start  
EmpNo Contract Start MinOfEnd
123 16 13-Mar-21 15/03/2021
123 40 16-Mar-21 17/03/2021
123 16 18-Mar-21 19/03/2021

Hmmm, I seem to have answered it.... I'll do some checking, tidying up and add it as an answer unless anyone can see any potential problems?

Not quite solved - single days will give an incorrect end date rather than the same date as the start.

CodePudding user response:

Need a group identifier. Using the given sample: the first 3 records are group 1, next 2 are group 2, last 2 are group 3. Use VBA behind Excel to populate a column. The group ID changes when the Contract value changes.

Sub SetGID()
Dim c As Integer, r As Integer, x As Integer, i As Integer
c = 1
r = 2
i = 1
x = Cells(r, 4)
Do While Not IsEmpty(Cells(r, 2))
    Cells(r, 1) = i
    r = r   1
    If Cells(r, 4) <> x Then
        i = i   1
        x = Cells(r, 4)
    End If
Loop
End Sub

Or import the data as is to Access and run similar procedure to update a field in table. This would involve opening a recordset and looping its records.

Then aggregate query summarizes on the GroupID.

SELECT          GroupID, EmpNo, Contract, 
                MIN(WorkDate) AS StartDate, 
                MAX(WorkDate) AS EndDate
FROM            RD1
GROUP BY        GroupID, EmpNo, Contract  

CodePudding user response:

One point I missed from my question, which is fairly important, is that when an EmpNo starts they will have sequential dates up to when they finish with no missing dates. With this in mind you can join the table to itself to return the previous and next day figures.

RD1

WorkDate EmpNo Contract
13-Mar-21 123 16
14-Mar-21 123 16
15-Mar-21 123 20
16-Mar-21 123 40
17-Mar-21 123 40
18-Mar-21 123 40
19-Mar-21 123 16

Query8

The first query will have two joins. The first joins the table with itself so 14-Mar-21 is connected to 15-Mar-21 and so on. The second joins the table with itself again so that 14-Mar-21 is connected to 13-Mar-21, etc.
With these joins you can check if the current Contract value is different from the previous or next days value - therefore identify the start and end of each block.
(I could probably tidy up the WHERE clause a bit)

SELECT  RD1.WorkDate,
        RD1.EmpNo,
        RD1.Contract,
        IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND 
            (RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start') AS StartID,
        IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
                 (RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End") AS EndID
FROM    (RD1 LEFT JOIN RD1 RD2 ON RD1.EmpNo = RD2.EmpNo AND
                                  RD1.WorkDate = RD2.WorkDate 1)
             LEFT JOIN RD1 RD3 ON RD1.EmpNo = RD3.EmpNo AND
                                  RD1.WorkDate = RD3.WorkDate-1                           
WHERE   NOT ISNULL(IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND 
                       (RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start',
                       IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
                           (RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End")))  
WorkDate EmpNo Contract StartID EndID
13-Mar-21 123 16 Start
14-Mar-21 123 16 End
15-Mar-21 123 20 Start End
16-Mar-21 123 40 Start
17-Mar-21 123 40
18-Mar-21 123 40 End
19-Mar-21 123 16 Start End

Query9
The next query joins each start date to each end date providing the start date is less than or equal to the end date.

SELECT  T1.EmpNo, 
        T1.Contract, 
        T1.Start, 
        T2.End
FROM    (
        SELECT  EmpNo, Contract, WorkDate AS Start 
        FROM    Query8 
        WHERE   NOT ISNULL(StartID)
        )  AS T1 LEFT JOIN 
        (
        SELECT  EmpNo, Contract, WorkDate AS [End] 
        FROM    Query8 
        WHERE   NOT ISNULL(EndID))  AS T2 
        ON (T1.Start<=T2.End) AND (T1.EmpNo = T2.EmpNo)  
EmpNo Contract Start End
123 16 13-Mar-21 19-Mar-21
123 16 13-Mar-21 18-Mar-21
123 16 13-Mar-21 15-Mar-21
123 16 13-Mar-21 14-Mar-21
123 20 15-Mar-21 19-Mar-21
123 20 15-Mar-21 18-Mar-21
123 20 15-Mar-21 15-Mar-21
123 40 16-Mar-21 19-Mar-21
123 40 16-Mar-21 18-Mar-21
123 16 19-Mar-21 19-Mar-21

Query10
The final query groups the data by EmpNo, Contract and Start and takes the minimum end date for each group.

SELECT      EmpNo, 
            Contract, 
            Start, 
            Min(End) AS MinOfEnd
FROM        Query9
GROUP BY    EmpNo, Contract, Start
ORDER BY    EmpNo, Start  
EmpNo Contract Start MinOfEnd
123 16 13-Mar-21 14/03/2021
123 20 15-Mar-21 15/03/2021
123 40 16-Mar-21 18/03/2021
123 16 19-Mar-21 19/03/2021
  • Related