Home > database >  get previous non-NULL value based on LAG function with dynamic offset
get previous non-NULL value based on LAG function with dynamic offset

Time:10-29

I have a tricky situation.

I have a source dataset; it has data for four employees and their departments based on an effective date.

I need to convert this source dataset to the destination dataset.

enter image description here

Both datasets are properly sorted by EmployeeName and EffectiveDate (ASC) order.

(Please see T-SQL scripts using temp table.)

CREATE TABLE #source

(

EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100)

);

INSERT INTO #source

VALUES

('Lisa','2017-06-25','Catering'),
('Lisa','2018-08-17',NULL),
('Lisa','2021-12-05','Gardening'),
('Melissa','2015-08-27',NULL),
('Melissa','2017-11-29','Office'),
('Melissa','2020-10-10','Driving'),
('Melissa','2022-07-11',NULL),
('Omar','2019-01-03',NULL),
('Omar','2020-04-07','Retail'),
('Omar','2021-03-29',NULL),
('Pat', '2012-09-12','Laundry'),
('Pat', '2013-10-30',NULL),
('Pat', '2014-11-29',NULL),
('Pat', '2015-08-16',NULL),
('Pat', '2016-11-05',NULL)


CREATE TABLE #destination

(

EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100),
PreviousNonNULLDepartmentIfAvailable varchar(100)

);

INSERT INTO #destination

VALUES

('Lisa','2017-06-25','Catering',NULL),
('Lisa','2018-08-17',NULL,'Catering'),
('Lisa','2021-12-05','Gardening','Catering'),
('Melissa','2015-08-27',NULL,NULL),
('Melissa','2017-11-29','Office',NULL),
('Melissa','2020-10-10','Driving','Office'),
('Melissa','2022-07-11',NULL,'Driving'),
('Omar','2019-01-03',NULL,NULL),
('Omar','2020-04-07','Retail',NULL),
('Omar','2021-03-29',NULL,'Retail'),
('Pat', '2012-09-12','Laundry',NULL),
('Pat', '2013-10-30',NULL,'Laundry'),
('Pat', '2014-11-29',NULL,'Laundry'),
('Pat', '2015-08-16',NULL,'Laundry'),
('Pat', '2016-11-05',NULL,'Laundry')



SELECT *
FROM #source
ORDER BY EmployeeName, EffectiveDate


SELECT *
FROM #destination
ORDER BY EmployeeName, EffectiveDate

In the destination dataset, I need one new column called [PreviousNonNULLDepartmentIfAvailable].

What is the logic to derive this above new column?

I need to get each individual's most recent (previous) department; it is easy to use a LAG function to get the most recent (previous) department. See T-SQL code below:

PreviousNonNULLDepartmentIfAvailable = LAG(CurrentDepartment) OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate)

However, I need the most recent (previous) non-NULL department; if there is no such "most recent (previous) non-NULL" department value within the PARTITION of EmployeeName, then I need to show NULL.

I have tried options such as LAG, LAST_VALUE, IGNORE NULLS clause, UNBOUNDED PRECEDING clause. These options are close to what I need, but NOT exactly what I need.

Effectively, I need to get what a LAG function would perform; but the offset value for LAG function has to be dynamic, instead of a static value such as 1 or 2 or 3...; the LAG function needs to iterate (backwards) as many rows as needed to catch the most recent (previous) non-NULL department value, within a PARTITION of EmployeeName.

This said, the column [PreviousNonNULLDepartmentIfAvailable] can still have NULL values, if there is no such "most recent (previous) non-NULL" department value available within a PARTITION of EmployeeName.

Also, the first row based on ascending order of Effective Date of each partition of EmployeeName will always have NULL as its [PreviousNonNULLDepartmentIfAvailable] value (obviously). This is natural in the way LAG function works.

Any idea on how to convert the source dataset to destination dataset ?

CodePudding user response:

A combination of LAG, ROW_NUMBER, a Flag to check for CurrentDepartment with NULL values, a subquery to get the first row number with a CurrentDepartment that IS NOT NULL (based on RN and Flag), and a CASE statement to check if the Lag_Value IS NULL should do the job here:

WITH CTE AS (SELECT
               EmployeeName, EffectiveDate, CurrentDepartment,
               LAG(CurrentDepartment) OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate) AS Lag_Value,
               ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate) AS RN,
               CASE WHEN CurrentDepartment IS NOT NULL THEN 1 ELSE 0 END AS Flag 
             FROM #source
)
SELECT EmployeeName, EffectiveDate, CurrentDepartment,
  CASE 
    WHEN Lag_Value IS NULL THEN PreviousDepartment
    ELSE Lag_Value 
  END AS PreviousNonNULLDepartmentIfAvailableFROM 
FROM (SELECT *,
  (SELECT TOP(1) CurrentDepartment FROM CTE WHERE Flag = 1 
  AND RN < a.RN AND EmployeeName = a.EmployeeName) AS PreviousDepartment
  FROM CTE a) a

Fiddle here.

UPDATED Fiddle here.

CodePudding user response:

SELECT

a.*,

c.PreviousNonBlankCurrentDepartment AS PreviousNonNULLDepartmentIfAvailable

FROM #source a

OUTER APPLY

(

SELECT TOP 1 b.CurrentDepartment AS PreviousNonBlankCurrentDepartment

FROM #source b
WHERE b.EmployeeName = a.EmployeeName
AND b.EffectiveDate < a.EffectiveDate
AND b.CurrentDepartment IS NOT NULL

ORDER BY b.EffectiveDate DESC

) c
  • Related