I have a source table as follow:
I want to get the below result:
EXPLANATION OF THE SOURCE TABLE:
This table contains requests with their related tasks on specific dates, each request can have multiple tasks and these tasks can occur many times, I marked each request and it related tasks with different color
For example, the request = 'NC2' in blue has 3 tasks:
- task1 occurs 3 times over time.
- task2 occurs 2 times over time.
- task3 occurs 1 time over time.
EXPLANATION OF THE RESULT TABLE:
The score indicates the number of tasks for each request based on time of occurrence.
For example, the request = 'NC2' in blue has 3 tasks:
SCORE COLUMN:
task1: first occurrence was on 1/5/2022, then it occurs on 1/7/2022 and lastly on 1/9/2022 "for the same request number"
, as marked on yellow, the same for task two it only occurs twis on 1/6/2022 and 1/8/2022 and for the task3 also.
You can see the score column contains the scores of the tasks based on their occurrence over time for specific request number.
STATUS COLUMN:
For the status it shows the first and last occurrence of a task for a specific request, and it ignores the middles like task1 marked in red it's not the first occurrence and neither the last, so it was ignored.
What I was able to achieve:
With the query below I was able to get SCORE column:
select RequestNumber, Task, StartDate, row_number() over(partition by RequestNumber, TaskName
order by START_DATE) as score
from [SOURCE_TABLE] order by RequestNumber, START_DATE
for the STATUS I am lost, and I couldn't solve it, probably I am thinking inside the box, so any suggestion are welcome.
CodePudding user response:
A slightly more efficient version of @Larnu's excellent answer:
SELECT
RequestNumber,
Task,
StartDate,
ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY StartDate) AS score,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY StartDate) = 1
THEN 'First'
WHEN LEAD(StartDate) OVER (PARTITION BY RequestNumber, TaskName ORDER BY StartDate) IS NULL
THEN 'Last'
END AS Status
FROM dbo.[SOURCE_TABLE];
It uses LEAD
instead of another ROW_NUMBER
. This means that the same windowing clause can be used, and therefore does not require another sort, nor a spool.
CodePudding user response:
One method would be to use a CASE
expression on the value of score
. As you'd need to check if the value of Score
was the MAX
value, then you'll need to use a CTE for this, like so:
WITH CTE AS
(SELECT RequestNumber,
Task,
StartDate,
ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE) AS score
FROM dbo.[SOURCE_TABLE])
SELECT RequestNumber,
Task,
StartDate,
score,
CASE score WHEN 1 THEN 'First'
WHEN MAX(score) OVER (PARTITION BY RequestNumber, TaskName) THEN 'Last'
END AS Status
FROM CTE
ORDER BY RequestNumber,
START_DATE;
Alternatively you could use ROW_NUMBER
again, but in the opposite direction, and the you wouldn't need the CTE:
SELECT RequestNumber,
Task,
StartDate,
ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE) AS score,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE) = 1 THEN 'First'
WHEN ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE DESC) = 1 THEN 'Last'
END AS Status
FROM dbo.[SOURCE_TABLE];
CodePudding user response:
You are pretty close, I'd go for something like
SELECT
RequestNumber
,Task
,StartDate
,AscRowNo AS Score
,CASE
WHEN AscRowNo = 1 AND DescRowNo = 1 THEN '???'
WHEN AscRowNo = 1 THEN 'First'
WHEN DescRowNo = 1 THEN 'Last'
ELSE NULL
END AS Status
FROM (
SELECT
RequestNumber
,Task
,StartDate
,row_number() over(partition by RequestNumber, TaskName order by START_DATE) as AscRowNo
,row_number() over(partition by RequestNumber, TaskName order by START_DATE DESC) as DescRowNo
from [SOURCE_TABLE]
) as x
ORDER BY
RequestNumber,
START_DATE
The two ROW_Numbers will identify the first and last occurrence of a task.
Note: what if there is only one occurrence globally? is it first or last?