Home > Software engineering >  Having issue in determining the correct Entry time and Exit time
Having issue in determining the correct Entry time and Exit time

Time:07-19

I am trying to identify the entry and exit time of the data. The calculation of exit time is, If the activity is not in the next activity line, then that is the exit time for that activity.

Can someone please help me to fix the below SQL.

Example: Here is the sample data

| VisitID | Line | ActivityName |   ActivityTime    |
-----------------------------------------------------
|   123   |  1   |   Activity1  |'2022-07-01 03:00' |
|   123   |  1   |   Activity2  |'2022-07-01 03:00' |
|   123   |  1   |   Activity3  |'2022-07-01 03:00' |
|   123   |  2   |   Activity1  |'2022-07-01 04:10' |
|   123   |  2   |   Activity2  |'2022-07-01 04:10' |
|   123   |  2   |   Activity3  |'2022-07-01 04:10' |
|   123   |  3   |   Activity1  |'2022-07-02 07:00' |
|   123   |  3   |   Activity3  |'2022-07-02 07:00' |
|   123   |  4   |   Activity1  |'2022-07-03 09:00' |
|   123   |  5   |   Activity2  |'2022-07-04 05:00' |

Here is how the output should be

| VisitID | ActivityName |     StartTime     |      EndTime     |
-----------------------------------------------------------------------
|   123   |  Activity1  |'2022-07-01 03:00' |'2022-07-04 05:00' |
|   123   |  Activity2  |'2022-07-01 03:00' |'2022-07-02 07:00' |
|   123   |  Activity3  |'2022-07-01 03:00' |'2022-07-03 09:00' |
|   123   |  Activity2  |'2022-07-04 05:00' |'2022-07-04 05:00' |

Sample SQL Table

DECLARE @T AS TABLE
(
            VisitID    INT,
            Line        INT,
            ActivityName Nvarchar(25),
            ActivityTime datetime
)


insert into @T VALUES

(123, 1, 'Activity1', '2022-07-01 03:00' ),
(123, 1, 'Activity2', '2022-07-01 03:00' ),
(123, 1, 'Activity3', '2022-07-01 03:00' ),
(123, 2, 'Activity1', '2022-07-01 04:10' ),
(123, 2, 'Activity2', '2022-07-01 04:10' ),
(123, 2, 'Activity3', '2022-07-01 04:10' ),
(123, 3, 'Activity1', '2022-07-02 07:00' ),
(123, 3, 'Activity3', '2022-07-02 07:00' ),
(123, 4, 'Activity1', '2022-07-03 09:00' ),
(123, 5, 'Activity2', '2022-07-04 05:00' )

SQL I have so far:

select tmp.ActivityName
    ,[Entry date] = min(tmp.ActivityTime)
    ,[Exit date] = max(tmp.ActivityTime)
from
(
    select t.ActivityName , t.ActivityTime
        , [Dense Rank] = DENSE_RANK() OVER (PARTITION BY t.ActivityName  ORDER BY t.ActivityTime)
        , [Group] = DATEADD(DAY,-1 * DENSE_RANK() OVER (PARTITION BY t.ActivityName  ORDER BY t.ActivityTime) ,t.ActivityTime)    
    from @T t
) tmp    
group by tmp.ActivityName,  tmp.[Group]

CodePudding user response:

Assuming the value of ActivityTime is functionally dependent upon the value of Line, I believe the following query gets you what you want.

I have made no attempt to optimize the logic for performance. It is a "direct" translation of the requirement logic into SQL.

How it works:

For each row in the original rowset, it finds, in order of line ascending, the first row in the set of rows which have a higher line than the original row's line, but the original row's activityName does not exist in the set. I alias this set as nxt, ie, "next".

However, this is not sufficent because a given ActivityName might start and end more than once. For example, "Activity2" is in the set where line = 1, then disappears once we reach line = 4, but reappears when line = 5.

Therefore I group the intermediate results by the value of line in the set we retrieved in step 1. I can also safely group by nxt.ActivityTime, because nxt.ActivityTime is functionally dependent upon nxt.line (it's not necessary to include line, here, but it makes it much easier to read the ungrouped output).

The starting time for the row is then just the minimum value of ActivityTIme from the original rows within this group.

Finally, if there is no "following set" (such as is the case with "Activity2" on line 5), I infer from your expected results that you want the EndTime to just be the same as the StartTime. This is what the isnull is for.

To understand it better I suggest running the query without the group by or min to see the intermediate results.


select         t.VisitId,
               t.ActivityName,
               StartTime = min(t.ActivityTime),
               EndTime = isnull(nxt.ActivityTime, min(t.ActivityTime))
from           @t t
outer apply    (
                  select   top 1 line, activityTime
                  from     @t t2
                  where    t2.visitId = t.visitId
                           and t2.line > t.line
                           and not exists
                           (
                              select *
                              from   @t t3
                              where  t3.activityName = t.activityName
                                     and t3.visitId = t2.visitId
                                     and t3.line = t2.line
                           )
                  order by t2.line asc
               ) nxt
group by       t.VisitId,
               t.ActivityName,
               nxt.line,
               nxt.ActivityTime
order by       min(t.ActivityTime);
  • Related