Home > database >  Club two table based on certain condition in postgresql
Club two table based on certain condition in postgresql

Time:01-23

Here's my sample input tables:

employee_id project effective_date**
1 A 2014-08-13
1 B 2016-12-21
1 C 2018-02-21
employee_id designation effective_date
1 trainee 2014-08-05
1 senior 2016-08-17
1 team leader 2018-02-05

Table1: describes an employee who undergoes different projects at different date's in an organization.

Table2: describes the same employee from Table1 who undergoes different designation in the same organisation.

Now I want an Expected output table like this:

employee_id project designation effective_date
1 A trainee 2014-08-13
1 A senior 2016-08-17
1 B Senior 2016-12-21
1 B team leader 2018-02-05
1 C team leader 2018-02-21

The fact is that whenever:

  • his project changes, I need to display project effective_date.
  • his designation changes, I need to display designation effective_date but with the project he worked on during this designation change

CodePudding user response:

Let's do this way

WITH subquery AS (
    SELECT employee_id, project, effective_date as project_effective_date
    FROM Table1
    UNION
    SELECT employee_id, project, effective_date as project_effective_date
    FROM Table1
    JOIN (
        SELECT employee_id, MAX(effective_date) as max_date
        FROM Table1
        GROUP BY employee_id
    ) max_dates
    ON Table1.employee_id = max_dates.employee_id AND Table1.effective_date = max_dates.max_date
)
SELECT subquery.employee_id, subquery.project, Table2.designation, subquery.project_effective_date
FROM subquery
LEFT JOIN Table2
ON subquery.employee_id = Table2.employee_id AND subquery.project_effective_date >= Table2.effective_date
ORDER BY subquery.employee_id, subquery.project_effective_date, Table2.effective_date

CodePudding user response:

This problem falls into the gaps-and-islands taxonomy. This specific variant can be solved in three steps:

  • applying a UNION ALL of the two tables while splitting "tab1.project" and "tab2.role" in two separate fields within the same schema
  • compute the partitions, between a non-null value and following null values, with two running sums (one for the "designation" and one for "project")
  • apply two different aggregations on the two different fields, to remove the null values.
WITH cte AS (
    SELECT employee_id, effective_date,
           project        AS project, 
           NULL           AS role         FROM tab1
    UNION ALL 
    SELECT employee_id, effective_date, 
           NULL           AS project, 
           designation    AS role         FROM tab2
), cte2 AS (
    SELECT *,
           COUNT(CASE WHEN project IS NOT NULL THEN 1 END) OVER(
               PARTITION BY employee_id 
               ORDER     BY effective_date
           ) AS project_partition,
           COUNT(CASE WHEN role IS NOT NULL THEN 1 END) OVER(
               PARTITION BY employee_id 
               ORDER     BY effective_date
           ) AS role_partition
    FROM cte    
)
SELECT employee_id, effective_date,
       MAX(project) OVER(PARTITION BY project_partition) AS project,
       MAX(role)    OVER(PARTITION BY role_partition)    AS role
FROM cte2
ORDER BY employee_id, effective_date

Check the demo here.

CodePudding user response:

This answer is turned out to be very similar to @Lemon's answer. But still I will not delete, at the discretion of @Lemon. I think that is usefull. First, union all dates for Employee (designation and project changes). We have null values in table.

employee_id designation project effective_date
1 trainee null 2014-08-05
1 null A 2014-08-13
1 senior null 2016-08-17
1 null B 2016-12-21
1 team leader null 2018-02-05
1 null C 2018-02-21

With method, discussed here, null values will be replaced properly in 2 partitions (project and designation).

with PD as(select employee_id, designation, null as project,effective_date from empDes
union all
select employee_id, null designation, project as project,effective_date from empProj
)
,PDranged as(
select *
    ,sum(case when project is null then 0 else 1 end)
           over(partition by employee_id order by effective_date) p_rn
    ,sum(case when designation is null then 0 else 1 end)
           over(partition by employee_id order by effective_date) d_rn
from PD
)
,Res as(
select * 
  ,first_value(project)
     over(partition by employee_id,p_rn order by effective_date) p_project
  ,first_value(designation)
     over(partition by employee_id,d_rn order by effective_date) p_designation
from  PDranged
)
select employee_id,p_project as project,p_designation as designation,effective_date 
from Res
order by effective_date
employee_id project designation effective_date
1 null trainee 2014-08-05
1 A trainee 2014-08-13
1 A senior 2016-08-17
1 B senior 2016-12-21
1 B team leader 2018-02-05
1 C team leader 2018-02-21

Demo here

  • Related