join equal or max-CodePudding

I want to join two tables by 'EmployeeId' and 'CalendarMonthId', but problem is that I want to connect record with the bigest CalednarMonthId if not exist equal. I use Azure Synapse SQL pool. Example data and code

create table emp_contr (
    EmployeeId int,
    CalendarMonthId int,
    IsDeleted int
    --login varchar(20) 
)

create table contr (
    ContractId int,
    EmployeeId int,
    CalendarMonthId int,
    value int
)

insert into emp_contr values (1, 202201, 0)
insert into emp_contr values (1, 202202, 0)
insert into emp_contr values (1, 202205, 0)
insert into emp_contr values (1, 202206, 0)
insert into emp_contr values (2, 202202, 0)
insert into emp_contr values (2, 202203, 0)

insert into contr values (1, 1, 202201, 5)
insert into contr values (2, 1, 202202, 2)
insert into contr values (40, 2, 202202, 2)
insert into contr values (50, 2, 202203, 0)

Base on this data I have problem with connect table: emp_contr, row: EmployeeId:1, CalendarMonthId:202205 with row from table contr from the same user and maximum CalendarMonthId:202202. I have query like this, but it doesn't work

select * 
from emp_contr ec
join contr c
    on ec.EmployeeId = c.EmployeeId
    and (ec.CalendarMonthId = c.CalendarMonthId or ec.CalendarMonthId > max(c.CalendarMonthId))
order by ec.EmployeeId

I expect result like this:

EmployeeId	Emp_Contr_CalendarMonthId	Contr_CalendarMonthId	Value	ContractId
1	202206	202202	2	2
1	202205	202202	2	2
1	202202	202202	2	2
1	202201	202201	5	1
2	202203	202203	0	50
2	202202	202202	2	40

I don't know what I should do. Maybe should I add row in temporary table based on contr table with the same values like bigest one but different CalendarMonthId

CodePudding user response：

TL;DR:

select ec.EmployeeId, ec.CalendarMonthId Emp_Contr_CalendarMonthId, 
  coalesce(c.CalendarMonthId,c2.CalendarMonthId) Contr_CalendarMonthId, 
  coalesce(c.value,c2.value) value, 
  coalesce(c.ContractId,c2.ContractId) ContractId,
-- additional columns for debugging
  c.CalendarMonthId, c.value, c.ContractId, 
  c2.CalendarMonthId, c2.value, c2.ContractId 
from emp_contr ec
left join contr c
on ec.EmployeeId = c.EmployeeId
and ec.CalendarMonthId = c.CalendarMonthId
join (
  select * 
  from (
    select EmployeeId,
         CalendarMonthId, 
         value, 
         ContractId,
         row_number() over (partition by EmployeeId order by CalendarMonthId desc) rn
    from contr
  ) x 
  where rn = 1
) c2 on ec.EmployeeId = c2.EmployeeId
order by ec.EmployeeId, ec.CalendarMonthId desc

SQL Fiddle

The coalesce assumes all relevant columns are NOT NULL. If this is not the case you'll have to replace them with CASE-expressions in which you check if c.CalendarMonthId is null.

Explanation

The statement joins contr twice.

First using an outer join to obtain the matching results when present, without loosing rows without a match.

The second join is an inner join. This assumes there is at least one contr row per emp_contr The join does join an inline view which uses the window function row_number() to filter only the last row per employee when ordered by CalendarMonthId

The last step is to combine the results, and pick the one from the first join if present and the one from the second join if not.