Club two table based on certain condition in postgresql-CodePudding

Here's my sample input tables:

employee_id	project	effective_date**
1	A	2014-08-13
1	B	2016-12-21
1	C	2018-02-21

employee_id	designation	effective_date
1	trainee	2014-08-05
1	senior	2016-08-17
1	team leader	2018-02-05

Table1: describes an employee who undergoes different projects at different date's in an organization.

Table2: describes the same employee from Table1 who undergoes different designation in the same organisation.

Now I want an Expected output table like this:

employee_id	project	designation	effective_date
1	A	trainee	2014-08-13
1	A	senior	2016-08-17
1	B	Senior	2016-12-21
1	B	team leader	2018-02-05
1	C	team leader	2018-02-21

The fact is that whenever:

his project changes, I need to display project effective_date.
his designation changes, I need to display designation effective_date but with the project he worked on during this designation change

CodePudding user response：

Let's do this way

WITH subquery AS (
    SELECT employee_id, project, effective_date as project_effective_date
    FROM Table1
    UNION
    SELECT employee_id, project, effective_date as project_effective_date
    FROM Table1
    JOIN (
        SELECT employee_id, MAX(effective_date) as max_date
        FROM Table1
        GROUP BY employee_id
    ) max_dates
    ON Table1.employee_id = max_dates.employee_id AND Table1.effective_date = max_dates.max_date
)
SELECT subquery.employee_id, subquery.project, Table2.designation, subquery.project_effective_date
FROM subquery
LEFT JOIN Table2
ON subquery.employee_id = Table2.employee_id AND subquery.project_effective_date >= Table2.effective_date
ORDER BY subquery.employee_id, subquery.project_effective_date, Table2.effective_date

CodePudding user response：

This problem falls into the gaps-and-islands taxonomy. This specific variant can be solved in three steps:

applying a UNION ALL of the two tables while splitting "tab1.project" and "tab2.role" in two separate fields within the same schema
compute the partitions, between a non-null value and following null values, with two running sums (one for the "designation" and one for "project")
apply two different aggregations on the two different fields, to remove the null values.

WITH cte AS (
    SELECT employee_id, effective_date,
           project        AS project, 
           NULL           AS role         FROM tab1
    UNION ALL 
    SELECT employee_id, effective_date, 
           NULL           AS project, 
           designation    AS role         FROM tab2
), cte2 AS (
    SELECT *,
           COUNT(CASE WHEN project IS NOT NULL THEN 1 END) OVER(
               PARTITION BY employee_id 
               ORDER     BY effective_date
           ) AS project_partition,
           COUNT(CASE WHEN role IS NOT NULL THEN 1 END) OVER(
               PARTITION BY employee_id 
               ORDER     BY effective_date
           ) AS role_partition
    FROM cte    
)
SELECT employee_id, effective_date,
       MAX(project) OVER(PARTITION BY project_partition) AS project,
       MAX(role)    OVER(PARTITION BY role_partition)    AS role
FROM cte2
ORDER BY employee_id, effective_date

Check the demo here.

CodePudding user response：

This answer is turned out to be very similar to @Lemon's answer. But still I will not delete, at the discretion of @Lemon. I think that is usefull. First, union all dates for Employee (designation and project changes). We have null values in table.

employee_id	designation	project	effective_date
1	trainee	null	2014-08-05
1	null	A	2014-08-13
1	senior	null	2016-08-17
1	null	B	2016-12-21
1	team leader	null	2018-02-05
1	null	C	2018-02-21

With method, discussed here, null values will be replaced properly in 2 partitions (project and designation).

with PD as(select employee_id, designation, null as project,effective_date from empDes
union all
select employee_id, null designation, project as project,effective_date from empProj
)
,PDranged as(
select *
    ,sum(case when project is null then 0 else 1 end)
           over(partition by employee_id order by effective_date) p_rn
    ,sum(case when designation is null then 0 else 1 end)
           over(partition by employee_id order by effective_date) d_rn
from PD
)
,Res as(
select * 
  ,first_value(project)
     over(partition by employee_id,p_rn order by effective_date) p_project
  ,first_value(designation)
     over(partition by employee_id,d_rn order by effective_date) p_designation
from  PDranged
)
select employee_id,p_project as project,p_designation as designation,effective_date 
from Res
order by effective_date

employee_id	project	designation	effective_date
1	null	trainee	2014-08-05
1	A	trainee	2014-08-13
1	A	senior	2016-08-17
1	B	senior	2016-12-21
1	B	team leader	2018-02-05
1	C	team leader	2018-02-21

Demo here