Home > Net >  Status of values generated by OVER and PARTITION BY in SQL Server
Status of values generated by OVER and PARTITION BY in SQL Server

Time:07-14

I have a source table as follow:

enter image description here

I want to get the below result:

enter image description here

EXPLANATION OF THE SOURCE TABLE:

This table contains requests with their related tasks on specific dates, each request can have multiple tasks and these tasks can occur many times, I marked each request and it related tasks with different color

For example, the request = 'NC2' in blue has 3 tasks:

  1. task1 occurs 3 times over time.
  2. task2 occurs 2 times over time.
  3. task3 occurs 1 time over time.

EXPLANATION OF THE RESULT TABLE:

The score indicates the number of tasks for each request based on time of occurrence.

For example, the request = 'NC2' in blue has 3 tasks:

SCORE COLUMN:

task1: first occurrence was on 1/5/2022, then it occurs on 1/7/2022 and lastly on 1/9/2022 "for the same request number"

, as marked on yellow, the same for task two it only occurs twis on 1/6/2022 and 1/8/2022 and for the task3 also.

You can see the score column contains the scores of the tasks based on their occurrence over time for specific request number.

STATUS COLUMN:

For the status it shows the first and last occurrence of a task for a specific request, and it ignores the middles like task1 marked in red it's not the first occurrence and neither the last, so it was ignored.

What I was able to achieve:

With the query below I was able to get SCORE column:

select RequestNumber, Task, StartDate, row_number() over(partition by RequestNumber, TaskName 
order by START_DATE) as score
from [SOURCE_TABLE] order by RequestNumber, START_DATE

for the STATUS I am lost, and I couldn't solve it, probably I am thinking inside the box, so any suggestion are welcome.

CodePudding user response:

A slightly more efficient version of @Larnu's excellent answer:

SELECT
  RequestNumber,
  Task,
  StartDate,
  ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY StartDate) AS score,
  CASE WHEN ROW_NUMBER()    OVER (PARTITION BY RequestNumber, TaskName ORDER BY StartDate) = 1
         THEN 'First'
       WHEN LEAD(StartDate) OVER (PARTITION BY RequestNumber, TaskName ORDER BY StartDate) IS NULL
         THEN 'Last'
       END AS Status
FROM dbo.[SOURCE_TABLE];

It uses LEAD instead of another ROW_NUMBER. This means that the same windowing clause can be used, and therefore does not require another sort, nor a spool.

CodePudding user response:

One method would be to use a CASE expression on the value of score. As you'd need to check if the value of Score was the MAX value, then you'll need to use a CTE for this, like so:

WITH CTE AS
    (SELECT RequestNumber,
            Task,
            StartDate,
            ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE) AS score
     FROM dbo.[SOURCE_TABLE])
SELECT RequestNumber,
       Task,
       StartDate,
       score,
       CASE score WHEN 1 THEN 'First'
                  WHEN MAX(score) OVER (PARTITION BY RequestNumber, TaskName) THEN 'Last'
       END AS Status
FROM CTE
ORDER BY RequestNumber,
         START_DATE;

Alternatively you could use ROW_NUMBER again, but in the opposite direction, and the you wouldn't need the CTE:

SELECT RequestNumber,
       Task,
       StartDate,
       ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE) AS score,
       CASE WHEN ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE) = 1 THEN 'First'
            WHEN ROW_NUMBER() OVER (PARTITION BY RequestNumber, TaskName ORDER BY START_DATE DESC) = 1 THEN 'Last'
       END AS Status
FROM dbo.[SOURCE_TABLE];

CodePudding user response:

You are pretty close, I'd go for something like

SELECT
     RequestNumber
    ,Task
    ,StartDate
    ,AscRowNo AS Score
    ,CASE
        WHEN AscRowNo = 1 AND DescRowNo = 1 THEN '???'
        WHEN AscRowNo = 1 THEN 'First'
        WHEN DescRowNo = 1 THEN 'Last'
        ELSE NULL
    END AS Status
FROM (
    SELECT
         RequestNumber
        ,Task
        ,StartDate
        ,row_number() over(partition by RequestNumber, TaskName order by START_DATE) as AscRowNo
        ,row_number() over(partition by RequestNumber, TaskName order by START_DATE DESC) as DescRowNo
    from [SOURCE_TABLE]
) as x
ORDER BY
    RequestNumber,
    START_DATE

The two ROW_Numbers will identify the first and last occurrence of a task.

Note: what if there is only one occurrence globally? is it first or last?

  • Related