I want to group multiple rows into one based on the value of a column.
Given this data:
SELECT EmployeeID, RoomId , [Event], EventDate
FROM TimeLog
WHERE EmployeeID = '107733'
AND EventDate BETWEEN '2020/02/26 00:00:00' AND '2020/02/26 23:59:59'
ORDER BY EventDate
EmployeeID | RoomID | Event | EventDate |
---|---|---|---|
107733 | 05-27F | Login | 2020-02-26 07:02:00 |
107733 | 05-27F | Logout | 2020-02-26 08:38:00 |
107733 | 05-25F | Login | 2020-02-26 08:39:00 |
107733 | 05-25F | Logout | 2020-02-26 08:51:00 |
107733 | 05-27F | Login | 2020-02-26 08:52:00 |
107733 | 05-27F | Logout | 2020-02-26 12:00:00 |
So based on the value of the Event column which contains 2 possible values - Login and Logout, I want to combine 2 rows into one so that the result would be something like this:
EmployeeID | RoomID | Login | Logout |
---|---|---|---|
107733 | 05-27F | 2020-02-26 07:02:00 | 2020-02-26 08:38:00 |
107733 | 05-25F | 2020-02-26 08:39:00 | 2020-02-26 08:51:00 |
107733 | 05-27F | 2020-02-26 08:52:00 | 2020-02-26 12:00:00 |
Additional Requirement (if possible):
- It should be sorted chronologically
- (EDGE CASE) In case "Login" is missing or has no value, it should get the value of the Logout, same if "Logout" has no value - it should get the value of the "Login" field. So Login and Logout will have the same value if one is missing.
EmployeeID | RoomID | Event | EventDate |
---|---|---|---|
107733 | 05-27F | Login | 2020-02-26 07:02:00 |
107733 | 05-25F | Logout | 2020-02-26 08:38:00 |
Note that in this case, this happens in different "Room". On 05-27F there's no Logout, while in 05-25F there's no Login. Expected result would be:
EmployeeID | RoomID | Login | Logout |
---|---|---|---|
107733 | 05-27F | 2020-02-26 07:02:00 | 2020-02-26 07:02:00 |
107733 | 05-25F | 2020-02-26 08:38:00 | 2020-02-26 08:38:00 |
My attempt to solve this problem:
SELECT
EmployeeID,
RoomID,
'Login' = (SELECT TOP 1 EventDate
FROM TimeLog li
WHERE li.EmployeeID = tl.EmployeeID
AND li.RoomID = tl.RoomID
AND li.[Event] = 1),
'Logout' = (SELECT TOP 1 EventDate
FROM TimeLog lo
WHERE lo.EmployeeID = tl.EmployeeID
AND lo.RoomID = tl.RoomID
AND lo.[Event] = 2)
FROM
TimeLog tl
WHERE
tl.EmployeeID = '107733'
AND tl.EventDate BETWEEN '2020/02/26 00:00:00' AND '2020/02/26 23:59:59'
CodePudding user response:
I believe this should do what you're after: DBFiddle Example
with cte as (
select row_number() over (partition by EmployeeId, RoomId order by EventDate) r
, *
from TimeLog
)
select coalesce(i.EventDate, o.EventDate) LoginDate
, coalesce(o.EventDate, i.EventDate) LogoutDate
, coalesce(i.EmployeeId, o.EmployeeId) EmployeeId_
, coalesce(i.RoomId, o.RoomId) RoomId_
from (select * from cte where Event = 'Login') i
full outer join (select * from cte where Event = 'Logout') o
on o.EmployeeId = i.EmployeeId
and o.RoomId = i.RoomId
and o.r = i.r 1
order by LoginDate, LogoutDate, EmployeeId_, RoomId_
This works by sequencing the events per employee/room combination.
If a logout appears immediately after a login (i.e. the sequence number (r) or the logout event is 1 more than that for the login event) then the 2 are related & put in the same row.
If you get any rows with a login that doesn't have an consecutive logout, or a logout without an immediately prior login, those are seen as having missing data, so you get null in the corresponding logout time / login time columns.
Hope that makes sense & helps - let me know if you need a deeper explation of anything.
Code to create demo data
create table TimeLog (
EmployeeId bigint not null,
RoomId nchar(6) not null,
Event nvarchar(6) not null,
EventDate DateTime not null
)
insert TimeLog (EmployeeID, RoomId, Event, EventDate)
values (107733, '05-27F', 'Login', '2020-02-26 07:02:00')
, (107733, '05-27F', 'Logout', '2020-02-26 08:38:00')
, (107733, '05-25F', 'Login', '2020-02-26 08:39:00')
, (107733, '05-25F', 'Logout', '2020-02-26 08:51:00')
, (107733, '05-25F', 'Login', '2020-02-27 08:39:00')
, (107733, '05-25F', 'Logout', '2020-02-27 08:51:00')
, (107733, '05-25X', 'Login', '2020-02-27 08:39:00')
, (107733, '05-25Y', 'Logout', '2020-02-27 08:51:00')
Example output
LoginDate | LogoutDate | EmployeeId_ | RoomId_ |
---|---|---|---|
2020-02-26 07:02:00.000 | 2020-02-26 08:38:00.000 | 107733 | 05-27F |
2020-02-26 08:39:00.000 | 2020-02-26 08:51:00.000 | 107733 | 05-25F |
2020-02-27 08:39:00.000 | 2020-02-27 08:39:00.000 | 107733 | 05-25X |
2020-02-27 08:39:00.000 | 2020-02-27 08:51:00.000 | 107733 | 05-25F |
2020-02-27 08:51:00.000 | 2020-02-27 08:51:00.000 | 107733 | 05-25Y |
CodePudding user response:
Maybe something like:
SELECT EmployeeID, RoomId,
MAX(CASE WHEN Event = 'Login' THEN EventDate ELSE NULL END) AS LatestLogin,
MAX(CASE WHEN Event = 'Logout' THEN EventDate ELSE NULL END) AS LatestLogout,
FROM Wherever
GROUP BY EmployeeID, RoomId
CodePudding user response:
If you're firmly in SQL Server world, PIVOT is also an option here:
SELECT
EmployeeID,
RoomID,
ISNULL([Login], [Logout]) AS LatestLogin,
ISNULL([Logout], [Login]) AS LatestLogout
FROM
TimeLog
PIVOT (
MAX(EventDate)
FOR Event IN ([Login], [Logout])
) AS PivotTable
ORDER BY
LatestLogin
;