I am trying to identify the entry and exit time of the data. The calculation of exit time is, If the activity is not in the next activity line, then that is the exit time for that activity.
Can someone please help me to fix the below SQL.
Example: Here is the sample data
| VisitID | Line | ActivityName | ActivityTime |
-----------------------------------------------------
| 123 | 1 | Activity1 |'2022-07-01 03:00' |
| 123 | 1 | Activity2 |'2022-07-01 03:00' |
| 123 | 1 | Activity3 |'2022-07-01 03:00' |
| 123 | 2 | Activity1 |'2022-07-01 04:10' |
| 123 | 2 | Activity2 |'2022-07-01 04:10' |
| 123 | 2 | Activity3 |'2022-07-01 04:10' |
| 123 | 3 | Activity1 |'2022-07-02 07:00' |
| 123 | 3 | Activity3 |'2022-07-02 07:00' |
| 123 | 4 | Activity1 |'2022-07-03 09:00' |
| 123 | 5 | Activity2 |'2022-07-04 05:00' |
Here is how the output should be
| VisitID | ActivityName | StartTime | EndTime |
-----------------------------------------------------------------------
| 123 | Activity1 |'2022-07-01 03:00' |'2022-07-04 05:00' |
| 123 | Activity2 |'2022-07-01 03:00' |'2022-07-02 07:00' |
| 123 | Activity3 |'2022-07-01 03:00' |'2022-07-03 09:00' |
| 123 | Activity2 |'2022-07-04 05:00' |'2022-07-04 05:00' |
Sample SQL Table
DECLARE @T AS TABLE
(
VisitID INT,
Line INT,
ActivityName Nvarchar(25),
ActivityTime datetime
)
insert into @T VALUES
(123, 1, 'Activity1', '2022-07-01 03:00' ),
(123, 1, 'Activity2', '2022-07-01 03:00' ),
(123, 1, 'Activity3', '2022-07-01 03:00' ),
(123, 2, 'Activity1', '2022-07-01 04:10' ),
(123, 2, 'Activity2', '2022-07-01 04:10' ),
(123, 2, 'Activity3', '2022-07-01 04:10' ),
(123, 3, 'Activity1', '2022-07-02 07:00' ),
(123, 3, 'Activity3', '2022-07-02 07:00' ),
(123, 4, 'Activity1', '2022-07-03 09:00' ),
(123, 5, 'Activity2', '2022-07-04 05:00' )
SQL I have so far:
select tmp.ActivityName
,[Entry date] = min(tmp.ActivityTime)
,[Exit date] = max(tmp.ActivityTime)
from
(
select t.ActivityName , t.ActivityTime
, [Dense Rank] = DENSE_RANK() OVER (PARTITION BY t.ActivityName ORDER BY t.ActivityTime)
, [Group] = DATEADD(DAY,-1 * DENSE_RANK() OVER (PARTITION BY t.ActivityName ORDER BY t.ActivityTime) ,t.ActivityTime)
from @T t
) tmp
group by tmp.ActivityName, tmp.[Group]
CodePudding user response:
Assuming the value of ActivityTime
is functionally dependent upon the value of Line
, I believe the following query gets you what you want.
I have made no attempt to optimize the logic for performance. It is a "direct" translation of the requirement logic into SQL.
How it works:
For each row in the original rowset, it finds, in order of line
ascending, the first row in the set of rows which have a higher line
than the original row's line
, but the original row's activityName
does not exist in the set. I alias this set as nxt
, ie, "next".
However, this is not sufficent because a given ActivityName
might start and end more than once. For example, "Activity2" is in the set where line = 1
, then disappears once we reach line = 4
, but reappears when line = 5
.
Therefore I group the intermediate results by the value of line
in the set we retrieved in step 1. I can also safely group by nxt.ActivityTime
, because nxt.ActivityTime
is functionally dependent upon nxt.line
(it's not necessary to include line
, here, but it makes it much easier to read the ungrouped output).
The starting time for the row is then just the minimum value of ActivityTIme
from the original rows within this group.
Finally, if there is no "following set" (such as is the case with "Activity2" on line 5), I infer from your expected results that you want the EndTime
to just be the same as the StartTime
. This is what the isnull
is for.
To understand it better I suggest running the query without the group by
or min
to see the intermediate results.
select t.VisitId,
t.ActivityName,
StartTime = min(t.ActivityTime),
EndTime = isnull(nxt.ActivityTime, min(t.ActivityTime))
from @t t
outer apply (
select top 1 line, activityTime
from @t t2
where t2.visitId = t.visitId
and t2.line > t.line
and not exists
(
select *
from @t t3
where t3.activityName = t.activityName
and t3.visitId = t2.visitId
and t3.line = t2.line
)
order by t2.line asc
) nxt
group by t.VisitId,
t.ActivityName,
nxt.line,
nxt.ActivityTime
order by min(t.ActivityTime);