Home > Blockchain >  LAG function not returning desired results
LAG function not returning desired results

Time:09-30

I have this table:

CREATE TABLE [dbo].[testtable](
[EmpID] [int] NOT NULL,
[Status] [nvarchar](5) NOT NULL,
[History] [nvarchar](5) NOT NULL,
[EntryDate] DateTime NOT NULL

)

INSERT INTO [dbo].[testtable]
       ([EmpID]
       ,[Status]
       ,[History]
       ,EntryDate
       )
 VALUES
       (1,'N','OLD','2022-03-01 13:00'),
       (1,'C','OLD','2022-03-01 16:00'),
       (1,'C','OLD','2022-04-01 16:00'),
       (1,'T','CUR','2022-05-01 08:00'),
       (2,'N','OLD','2022-04-01 16:00'),
       (2,'R','OLD','2022-05-01 07:00'),
       (2,'F','OLD','2022-06-01 15:00'),
       (2,'S','CUR','2022-07-01 14:00'),
       (3,'N','CUR','2022-03-01 17:00'),
       (4,'N','OLD','2022-05-01 16:00'),
       (4,'F','OLD','2022-06-01 11:00'),
       (4,'G','OLD','2022-07-01 20:00'),
       (4,'G','CUR','2022-08-01 19:00')
    
  

In my current output, it seems the beginning first record of a different ID selects the prior status of the previous EMPID.EMPID3 will not be included since they have no change

SELECT 
    EMPID, FromSt, ToSt, History
FROM 
    (SELECT 
         EMPID,
         ISNULL(LAG(Status) OVER (ORDER BY EMPID ASC), 'N') AS FromSt,
         Status AS ToSt,
         History
     FROM  
         [dbo].[testable]
     -- WHERE History = 'CUR'
    ) InnerQuery
WHERE 
    FromSt <> ToSt

Output:

       EMPID    FromSt  ToSt    
       ---------------------
       1         N      C         
       1         C      T         
       2         T      N        
       2         N      R         
       2         R      F         
       2         F      S         
       3         S      N         
       4         N      F         
       4         F      G        

where each EmpID will go through various status changes. The Oldest per EMPID records will always have a Value of STATUS 'N' and OLD in History and the latest record will always have a History value of 'CUR'

Scenario #1

I would like the output to show only when there is a change between records as below when I select all records

       EmpID  FromSt  ToSt   
       -------------------
        1       N       C     
        1       C       T     
        2       N       R     
        2       R       F    
        2       F       S  
        4       N       G  

Scenario #2

If I only select 'CUR' I want the output to choose the most current status that is different from the current one and where there is a status change. So again EMPID3 would not be included

       EmpID    FromSt    ToSt   
       -----------------------
       1        C          T   
       2        F          S
       2        R          S
       4        F          G

CodePudding user response:

There are several problems with the code.

In order to evaluate LAG separately for each EMPID, you need to include the PARTITION BY clause in the LAG function.
https://learn.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql?view=sql-server-ver16

You need to chose what order to sort the data for LAG to use. There doesn't appear to be anything in your data (a date, perhaps) that would enable you to choose the order. Without this, the results won't be consistent as SQL Server doesn't guarantee the the results are returned in any particular order unless you tell it to. Plus LAG requires an ORDER BY clause.

CodePudding user response:

Thanks to dougp advice. here is the solution

SELECT
    EMPID,FromSt,ToSt,History
    FROM (SELECT EMPID,
    ISNULL(LAG(Status) OVER(PARTITION BY Empid  ORDER BY EmpID,EntryDate ),'N')As FromSt,  
    Status AS ToSt,
    History
    FROM  [dbo].[testtable]
    --WHERE  History='CUR'
          

    ) InnerQuery
    WHERE FromSt<>ToSt 
  • Related