Home > Back-end >  Group multiple rows into one base on the value of a column
Group multiple rows into one base on the value of a column

Time:09-16

I want to group multiple rows into one based on the value of a column.

Given this data:

SELECT EmployeeID, RoomId , [Event], EventDate 
FROM TimeLog
WHERE EmployeeID = '107733' 
  AND EventDate BETWEEN '2020/02/26 00:00:00' AND  '2020/02/26 23:59:59' 
ORDER BY EventDate
EmployeeID RoomID Event EventDate
107733 05-27F Login 2020-02-26 07:02:00
107733 05-27F Logout 2020-02-26 08:38:00
107733 05-25F Login 2020-02-26 08:39:00
107733 05-25F Logout 2020-02-26 08:51:00
107733 05-27F Login 2020-02-26 08:52:00
107733 05-27F Logout 2020-02-26 12:00:00

So based on the value of the Event column which contains 2 possible values - Login and Logout, I want to combine 2 rows into one so that the result would be something like this:

EmployeeID RoomID Login Logout
107733 05-27F 2020-02-26 07:02:00 2020-02-26 08:38:00
107733 05-25F 2020-02-26 08:39:00 2020-02-26 08:51:00
107733 05-27F 2020-02-26 08:52:00 2020-02-26 12:00:00

Additional Requirement (if possible):

  • It should be sorted chronologically
  • (EDGE CASE) In case "Login" is missing or has no value, it should get the value of the Logout, same if "Logout" has no value - it should get the value of the "Login" field. So Login and Logout will have the same value if one is missing.
EmployeeID RoomID Event EventDate
107733 05-27F Login 2020-02-26 07:02:00
107733 05-25F Logout 2020-02-26 08:38:00

Note that in this case, this happens in different "Room". On 05-27F there's no Logout, while in 05-25F there's no Login. Expected result would be:

EmployeeID RoomID Login Logout
107733 05-27F 2020-02-26 07:02:00 2020-02-26 07:02:00
107733 05-25F 2020-02-26 08:38:00 2020-02-26 08:38:00

My attempt to solve this problem:

SELECT 
    EmployeeID, 
    RoomID, 
    'Login' = (SELECT TOP 1 EventDate 
               FROM TimeLog li 
               WHERE li.EmployeeID = tl.EmployeeID 
                 AND li.RoomID = tl.RoomID 
                 AND li.[Event] = 1),
    'Logout' = (SELECT TOP 1 EventDate 
                FROM TimeLog lo 
                WHERE lo.EmployeeID = tl.EmployeeID 
                  AND lo.RoomID = tl.RoomID 
                  AND lo.[Event] = 2)
FROM  
    TimeLog tl
WHERE 
    tl.EmployeeID = '107733'
    AND tl.EventDate BETWEEN '2020/02/26 00:00:00' AND '2020/02/26 23:59:59' 

CodePudding user response:

I believe this should do what you're after: DBFiddle Example

with cte as (
  select row_number() over (partition by EmployeeId, RoomId order by EventDate) r
  , *
  from TimeLog
)
select coalesce(i.EventDate, o.EventDate) LoginDate
, coalesce(o.EventDate, i.EventDate) LogoutDate
, coalesce(i.EmployeeId, o.EmployeeId) EmployeeId_
, coalesce(i.RoomId, o.RoomId)  RoomId_
from (select * from cte where Event = 'Login') i
full outer join (select * from cte where Event = 'Logout') o
on o.EmployeeId = i.EmployeeId
and o.RoomId = i.RoomId
and o.r = i.r 1
order by LoginDate, LogoutDate, EmployeeId_, RoomId_

This works by sequencing the events per employee/room combination.

If a logout appears immediately after a login (i.e. the sequence number (r) or the logout event is 1 more than that for the login event) then the 2 are related & put in the same row.

If you get any rows with a login that doesn't have an consecutive logout, or a logout without an immediately prior login, those are seen as having missing data, so you get null in the corresponding logout time / login time columns.

Hope that makes sense & helps - let me know if you need a deeper explation of anything.


Code to create demo data

create table TimeLog (
  EmployeeId bigint not null,
  RoomId nchar(6) not null,
  Event nvarchar(6) not null,
  EventDate DateTime not null
)
insert TimeLog (EmployeeID, RoomId, Event, EventDate)
values (107733, '05-27F',   'Login',    '2020-02-26 07:02:00')
     , (107733, '05-27F',   'Logout',   '2020-02-26 08:38:00')
     , (107733, '05-25F',   'Login',    '2020-02-26 08:39:00')
     , (107733, '05-25F',   'Logout',   '2020-02-26 08:51:00')
     , (107733, '05-25F',   'Login',    '2020-02-27 08:39:00')
     , (107733, '05-25F',   'Logout',   '2020-02-27 08:51:00')
     , (107733, '05-25X',   'Login',    '2020-02-27 08:39:00')
     , (107733, '05-25Y',   'Logout',   '2020-02-27 08:51:00')

Example output

LoginDate LogoutDate EmployeeId_ RoomId_
2020-02-26 07:02:00.000 2020-02-26 08:38:00.000 107733 05-27F
2020-02-26 08:39:00.000 2020-02-26 08:51:00.000 107733 05-25F
2020-02-27 08:39:00.000 2020-02-27 08:39:00.000 107733 05-25X
2020-02-27 08:39:00.000 2020-02-27 08:51:00.000 107733 05-25F
2020-02-27 08:51:00.000 2020-02-27 08:51:00.000 107733 05-25Y

CodePudding user response:

Maybe something like:

SELECT EmployeeID, RoomId,
    MAX(CASE WHEN Event = 'Login'  THEN EventDate ELSE NULL END) AS LatestLogin,
    MAX(CASE WHEN Event = 'Logout' THEN EventDate ELSE NULL END) AS LatestLogout,
FROM Wherever
GROUP BY EmployeeID, RoomId

CodePudding user response:

If you're firmly in SQL Server world, PIVOT is also an option here:

SELECT
    EmployeeID,
    RoomID,
    ISNULL([Login], [Logout]) AS LatestLogin,
    ISNULL([Logout], [Login]) AS LatestLogout
FROM
    TimeLog
PIVOT (
    MAX(EventDate)
    FOR Event IN ([Login], [Logout])
    ) AS PivotTable
ORDER BY
    LatestLogin
;
  • Related