Home > database >  How to select only rows with 2 consecutive "Yes" values ordered by year in SQL?
How to select only rows with 2 consecutive "Yes" values ordered by year in SQL?

Time:09-09

I have a query to return sample values for each employee per calendar year, and a column that checks (yes/no) if the sample value is >= 60,000.

My initial data:

Employee_ID Calendar_Year Sample_Value  Sample_Check
1234        2020          55,000        No
1234        2021          70,000        Yes
1234        2022          50,000        No
3456        2020          80,000        Yes
3456        2021          40,000        No
3456        2022          65,000        Yes
5678        2020          30,000        No
5678        2021          70,000        Yes
5678        2022          90,000        Yes

I would like to get this result, because this employee is the only one with "yes" for 2 consecutive calendar years.

Employee_ID Calendar_Year Sample_Value  Sample_Check
5678        2022          90,000        Yes

I have looked up similar questions but could not find something that solves my issue. I have also looked into LAG and LEAD but need help in understanding if they can give me the result I want.

CodePudding user response:

I'm not sure how bullet proof this is. I used the lag function in a window partition to get the prior Sample_Check. I then matched on the outer query to get the record that (basically shows yes = yes). If you had 3 (Yes) in a row then it would pull back 2. You might be able to use some conditional logic to offset the rows if you ran into that scenario

SELECT
*
FROM
(
SELECT Employee_ID
      ,Calendar_Year
      ,Sample_Value
      ,Sample_Check
      , LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
  FROM EMPLOYEETABLE
 
) X 
WHERE Sample_Check = LagSampleCheck1
ORDER BY Employee_ID ASC, Calendar_Year ASC 

I also created this one that does another row_number() Over (Partition BY Employee ID and Order by Calendar year so it picks up the latest year if you have a situation where you have more than one that meet that criteria. I added another record to your original data set (Employee ID 5678, Calendar Year 2023, Samples Value and Sample Check Yes) too create two records.

Employee_ID Calendar_Year   Sample_Value    Sample_Check
1234        2020        55,000      No        
1234        2021        70,000      Yes       
1234        2022        50,000      No        
3456        2020        80,000      Yes       
3456        2021        40,000      No        
3456        2022        65,000      Yes       
5678        2020        30,000      No        
5678        2021        70,000      Yes       
5678        2022        90,000      Yes       
5678        2023        90,000      Yes

  

SELECT
*
FROM
(
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY CALENDAR_YEAR DESC) AS ROWCOUNTER
FROM
(
SELECT Employee_ID
      ,Calendar_Year
      ,Sample_Value
      ,Sample_Check
      , LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
  FROM EMPLOYEETABLE
 
) X 
WHERE Sample_Check = LagSampleCheck1
) Z
WHERE ROWCOUNTER = 1
ORDER BY Employee_ID ASC, Calendar_Year ASC

CodePudding user response:

I would tend towards using a correlated query to find qualifying rows, followed by a row_number window to select the greatest/least of each group you require:

with v as (
  select *, 
  case when exists (
    select * from t t2 
    where t2.Employee_ID = t.Employee_ID 
      and t.Sample_Check = 'Yes' 
      and t2.Sample_Check = 'Yes' 
      and t2.Calendar_Year = t.Calendar_Year - 1
    ) then 1 else 0 end valid
    from t
), s as (
  select *,
    Row_Number() over(partition by Employee_ID, valid order by Calendar_Year desc) rn
  from v
)
select Employee_Id, Calendar_Year, Sample_Value, Sample_Check
from s
where valid = 1 and rn = 1;

CodePudding user response:

This is the most straightforward solution . Just join the table with itself ( assuming calendar year is numeric )

SELECT t1.*, t2.sample_check
FROM data AS t1, data AS t2
WHERE t1.emp_id = t2.emp_id
  AND t1.calendar_year = t2.calendar_year   1
  AND t1.sample_check = t2.sample_check
  AND t1.sample_check = 'Yes'

test it

Also you can get same result with lag function with this;

WITH temp AS (SELECT emp_id
                 , calendar_year
                 , sample_value
                 , sample_check
                 , lag( CASE WHEN sample_check = 'Yes' THEN 1 ELSE 0 END, 1 )
                   OVER (PARTITION BY emp_id ORDER BY calendar_year) AS prevcheck
            FROM data)
SELECT *
FROM temp
WHERE prevcheck = 1
  AND sample_check = 'Yes'

Both gives the same result

emp_id  calendar_year   sample_value    sample_check    prevcheck

5678      2022             90             Yes              1

test it

  • Related