Home > Software design >  SQL Rowwise comparison between groups
SQL Rowwise comparison between groups

Time:08-05

Question

The following is a snippet of my data:

Create Table Emps(person VARCHAR(50), started DATE, stopped DATE);

Insert Into Emps Values
('p1','2015-10-10','2016-10-10'),
('p1','2016-10-11','2017-10-11'),
('p1','2017-10-12','2018-10-13'),
('p2','2019-11-13','2019-11-13'),
('p2','2019-11-14','2020-10-14'),
('p3','2020-07-15','2021-08-15'),
('p3','2021-08-16','2022-08-16');

db<>fiddle.

I want to use T-SQL to get a count of how many persons fulfil the following criteria at least once - multiples should also count as one:

For a person:

  1. One of the dates in 'started' (say s1) is larger than at least one of the dates in 'ended' (say e1)
  2. s1 and e1 are in the same year, to be set manually - e.g. '2021-01-01' until '2022-01-01'

Example expected response

If I put the date range '2016-01-01' until '2017-01-01' somewhere in a WHERE / HAVING clause, the output should be 1 as only p1 has both a start date and an end date that fall in 2016 where the start date is larger than the end date:

s1 = '2016-10-11', and e1 = '2016-10-10'.

Why can't I do this myself

The reason I'm stuck is that I don't know how to do this rowwise comparison between groups. The question requires comparing values across columns (start with end) across rows, within a person ID.

CodePudding user response:

You said you plan to set the dates manually, so this works where we set the start date in one CTE, and the end date in another CTE. Then we calculate the min/max for each, and use that criteria in the query where statement.

with min_max_start as (
  select person, 
    min(started) as min_start, --obsolete
    max(started) as max_start
  from emps
  where started >= '2016-01-01'
  group by person
  ), 
  min_max_end as (
  select person, 
    min(stopped) as min_stop,
    max(stopped) as max_stop --obsolete
  from emps
  where stopped < '2017-01-01'
  group by person
  )
select count(distinct e.person)
from emps e
join min_max_start mms
  on e.person = mms.person
join min_max_end mme
  on e.person = mme.person
where mms.max_start> mme.min_stop

Output: 1

CodePudding user response:

Use conditional aggregation to get the maximum start date and the minimum stop date in the given range.

select person
from emps
group by person
having max(case when started >= '2016-01-01' and started < '2017-01-01' 
           then started end) > 
       min(case when stopped >= '2016-01-01' and stopped < '2017-01-01' 
           then stopped end);

Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=45adb153fcac9ce72708f1283cac7833

CodePudding user response:

Try the following:

With CTE as 
(
 Select D.person, D.started, T.stopped, 
       case
         when Year(D.started) = Year(T.stopped) and D.started > T.stopped 
           then 1 
           else 0 
       end as chk
 From
 (Select person, started From Emps Where started >= '2016-01-01') D
 Join
 (Select person, stopped From Emps Where stopped <= '2017-01-01') T
 On D.person = T.person
)

Select Count(Distinct person) as CNT
From CTE
Where chk = 1;

To get the employee list who met the criteria use the following on the CTE instead of the above Select Count... query:

Select person, started, stopped
From CTE
Where chk = 1;

See a demo from db<>fiddle.

  • Related