I've killed my spreadsheet - who knew querying 180k rows would be too much. :)
I've got my database replacement planned out, but have come up against a couple of stumbling blocks while normalising and cleaning the spreadsheet data.
I have a table of data similar to below called RD1
.
WorkDate | EmpNo | Contract |
---|---|---|
13-Mar-21 | 123 | 16 |
14-Mar-21 | 123 | 16 |
15-Mar-21 | 123 | 16 |
16-Mar-21 | 123 | 40 |
17-Mar-21 | 123 | 40 |
18-Mar-21 | 123 | 16 |
19-Mar-21 | 123 | 16 |
I am trying to group the table so I have the start and end date for each change in Contract.
My final table would look like:
EmpNo | Contract | StartDate | EndDate |
---|---|---|---|
123 | 16 | 13-Mar-21 | 15-Mar-21 |
123 | 40 | 16-Mar-21 | 17-Mar-21 |
123 | 16 | 18-Mar-21 | 19-Mar-21 |
As expected, the SQL below returns a two row table.
SELECT EmpNo, Contract,
MIN(WorkDate) AS StartDate,
MAX(WorkDate) AS EndDate
FROM RD1
GROUP BY EmpNo, Contract
EmpNo | Contract | StartDate | EndDate |
---|---|---|---|
123 | 16 | 13/03/2021 | 19/03/2021 |
123 | 40 | 16/03/2021 | 17/03/2021 |
My mind is blank on how to solve this - been over ten years since I was writing SQL regularly.
Any help would be greatly appreciated.
Edit
A further thought I had was that the WorkDate is sequential with no dates missing so I can join the table to itself a couple of times to identify the start and end of each group. For now this is just called Query1
.
SELECT RD1.WorkDate,
RD1.EmpNo,
RD1.Contract,
IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND
(RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start',
IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
(RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End")) AS Identifier
FROM (RD1 LEFT JOIN RD1 RD2 ON RD1.EmpNo = RD2.EmpNo AND
RD1.WorkDate = RD2.WorkDate 1)
LEFT JOIN RD1 RD3 ON RD1.EmpNo = RD3.EmpNo AND
RD1.WorkDate = RD3.WorkDate-1
WHERE NOT ISNULL(IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND
(RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start',
IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
(RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End")))
This gives the following table:
WorkDate | EmpNo | Contract | Identifier |
---|---|---|---|
13-Mar-21 | 123 | 16 | Start |
15-Mar-21 | 123 | 16 | End |
16-Mar-21 | 123 | 40 | Start |
17-Mar-21 | 123 | 40 | End |
18-Mar-21 | 123 | 16 | Start |
19-Mar-21 | 123 | 16 | End |
I can then use Query1
to get Start/End date pairs, but this joins each start date with each end date.
SELECT T1.EmpNo,
T1.Contract,
T1.Start,
T2.End
FROM (SELECT EmpNo, Contract, WorkDate AS Start
FROM Query1 WHERE Identifier = 'Start') AS T1 LEFT JOIN
(SELECT EmpNo, Contract, WorkDate AS End FROM Query1 WHERE Identifier = 'End') AS T2
ON (T1.Start<=T2.End) AND (T1.EmpNo = T2.EmpNo)
EmpNo | Contract | Start | End |
---|---|---|---|
123 | 16 | 13-Mar-21 | 19-Mar-21 |
123 | 16 | 13-Mar-21 | 17-Mar-21 |
123 | 16 | 13-Mar-21 | 15-Mar-21 |
123 | 40 | 16-Mar-21 | 19-Mar-21 |
123 | 40 | 16-Mar-21 | 17-Mar-21 |
123 | 16 | 18-Mar-21 | 19-Mar-21 |
I guess I need it to get the minimum end date that is greater than the start date:
SELECT EmpNo,
Contract,
Start,
Min(End) AS MinOfEnd
FROM Query6
GROUP BY EmpNo, Contract, Start
ORDER BY Start
EmpNo | Contract | Start | MinOfEnd |
---|---|---|---|
123 | 16 | 13-Mar-21 | 15/03/2021 |
123 | 40 | 16-Mar-21 | 17/03/2021 |
123 | 16 | 18-Mar-21 | 19/03/2021 |
Hmmm, I seem to have answered it.... I'll do some checking, tidying up and add it as an answer unless anyone can see any potential problems?
Not quite solved - single days will give an incorrect end date rather than the same date as the start.
CodePudding user response:
Need a group identifier. Using the given sample: the first 3 records are group 1, next 2 are group 2, last 2 are group 3. Use VBA behind Excel to populate a column. The group ID changes when the Contract value changes.
Sub SetGID()
Dim c As Integer, r As Integer, x As Integer, i As Integer
c = 1
r = 2
i = 1
x = Cells(r, 4)
Do While Not IsEmpty(Cells(r, 2))
Cells(r, 1) = i
r = r 1
If Cells(r, 4) <> x Then
i = i 1
x = Cells(r, 4)
End If
Loop
End Sub
Or import the data as is to Access and run similar procedure to update a field in table. This would involve opening a recordset and looping its records.
Then aggregate query summarizes on the GroupID.
SELECT GroupID, EmpNo, Contract,
MIN(WorkDate) AS StartDate,
MAX(WorkDate) AS EndDate
FROM RD1
GROUP BY GroupID, EmpNo, Contract
CodePudding user response:
One point I missed from my question, which is fairly important, is that when an EmpNo
starts they will have sequential dates up to when they finish with no missing dates. With this in mind you can join the table to itself to return the previous and next day figures.
RD1
WorkDate | EmpNo | Contract |
---|---|---|
13-Mar-21 | 123 | 16 |
14-Mar-21 | 123 | 16 |
15-Mar-21 | 123 | 20 |
16-Mar-21 | 123 | 40 |
17-Mar-21 | 123 | 40 |
18-Mar-21 | 123 | 40 |
19-Mar-21 | 123 | 16 |
Query8
The first query will have two joins. The first joins the table with itself so 14-Mar-21 is connected to 15-Mar-21 and so on. The second joins the table with itself again so that 14-Mar-21 is connected to 13-Mar-21, etc.
With these joins you can check if the current Contract value is different from the previous or next days value - therefore identify the start and end of each block.
(I could probably tidy up the WHERE clause a bit)
SELECT RD1.WorkDate,
RD1.EmpNo,
RD1.Contract,
IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND
(RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start') AS StartID,
IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
(RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End") AS EndID
FROM (RD1 LEFT JOIN RD1 RD2 ON RD1.EmpNo = RD2.EmpNo AND
RD1.WorkDate = RD2.WorkDate 1)
LEFT JOIN RD1 RD3 ON RD1.EmpNo = RD3.EmpNo AND
RD1.WorkDate = RD3.WorkDate-1
WHERE NOT ISNULL(IIF((RD1.EmpNo = RD2.EmpNo OR ISNULL(RD2.EmpNo)) AND
(RD1.Contract <> RD2.Contract OR ISNULL(RD2.Contract)),'Start',
IIF((RD1.EmpNo = RD3.EmpNo OR ISNULL(RD3.EmpNo)) AND
(RD1.Contract <> RD3.Contract OR ISNULL(RD3.Contract)),"End")))
WorkDate | EmpNo | Contract | StartID | EndID |
---|---|---|---|---|
13-Mar-21 | 123 | 16 | Start | |
14-Mar-21 | 123 | 16 | End | |
15-Mar-21 | 123 | 20 | Start | End |
16-Mar-21 | 123 | 40 | Start | |
17-Mar-21 | 123 | 40 | ||
18-Mar-21 | 123 | 40 | End | |
19-Mar-21 | 123 | 16 | Start | End |
Query9
The next query joins each start date to each end date providing the start date is less than or equal to the end date.
SELECT T1.EmpNo,
T1.Contract,
T1.Start,
T2.End
FROM (
SELECT EmpNo, Contract, WorkDate AS Start
FROM Query8
WHERE NOT ISNULL(StartID)
) AS T1 LEFT JOIN
(
SELECT EmpNo, Contract, WorkDate AS [End]
FROM Query8
WHERE NOT ISNULL(EndID)) AS T2
ON (T1.Start<=T2.End) AND (T1.EmpNo = T2.EmpNo)
EmpNo | Contract | Start | End |
---|---|---|---|
123 | 16 | 13-Mar-21 | 19-Mar-21 |
123 | 16 | 13-Mar-21 | 18-Mar-21 |
123 | 16 | 13-Mar-21 | 15-Mar-21 |
123 | 16 | 13-Mar-21 | 14-Mar-21 |
123 | 20 | 15-Mar-21 | 19-Mar-21 |
123 | 20 | 15-Mar-21 | 18-Mar-21 |
123 | 20 | 15-Mar-21 | 15-Mar-21 |
123 | 40 | 16-Mar-21 | 19-Mar-21 |
123 | 40 | 16-Mar-21 | 18-Mar-21 |
123 | 16 | 19-Mar-21 | 19-Mar-21 |
Query10
The final query groups the data by EmpNo
, Contract
and Start
and takes the minimum end date for each group.
SELECT EmpNo,
Contract,
Start,
Min(End) AS MinOfEnd
FROM Query9
GROUP BY EmpNo, Contract, Start
ORDER BY EmpNo, Start
EmpNo | Contract | Start | MinOfEnd |
---|---|---|---|
123 | 16 | 13-Mar-21 | 14/03/2021 |
123 | 20 | 15-Mar-21 | 15/03/2021 |
123 | 40 | 16-Mar-21 | 18/03/2021 |
123 | 16 | 19-Mar-21 | 19/03/2021 |