I have these records in the table - employee_projects
id | employee_id | project_id | status |
---|---|---|---|
1 | emp1 | proj1 | VERIFIED |
2 | emp2 | proj2 | REJECTED |
3 | emp1 | proj1 | VERIFIED |
4 | emp1 | proj3 | REJECTED |
5 | emp2 | proj2 | REQUIRED |
6 | emp3 | proj4 | SUBMITTED |
7 | emp4 | proj5 | VERIFIED |
8 | emp4 | proj6 | VERIFIED |
9 | emp3 | proj4 | REQUIRED |
Here are the criteria for determining duplicates:
- Same employee ID, same project ID under the same status (Example: rows 1 and 3 are duplicates)
- Same employee ID, same project ID but in different status (Example: rows 6 and 9 are duplicates). An exception to duplication criteria#2 is if one project is REQUIRED and the same project is also REJECTED under the same employee, this is NOT considered a duplicate. For example, rows 2 and 5 are NOT duplicates.
I have a query for the first criterion:
select
emp_id,
proj_id,
status,
COUNT(*)
from
employee_projects
group by
emp_id,
proj_id,
status
having
COUNT(*) > 1
What I'm struggling to construct is the SQL for the second criterion.
CodePudding user response:
maybe a self join can help you.
with t (employee_id ,project_id,status)
as
(
select 'emp1', 'proj1' , 'VERIFIED'
Union all select 'emp2', 'proj2' , 'REJECTED'
Union all select 'emp1', 'proj1' , 'VERIFIED'
Union all select 'emp1', 'proj3' , 'REJECTED'
Union all select 'emp2', 'proj2' , 'REQUIRED'
Union all select 'emp3', 'proj4' , 'SUBMITTED'
Union all select 'emp4', 'proj5' , 'VERIFIED'
Union all select 'emp4', 'proj6' , 'VERIFIED'
Union all select 'emp3', 'proj4' , 'REQUIRED'
)
select
t.employee_id,
t.project_id,
t.status,
'' as status,
'criteria#1' as SQL
from
t
group by
t.employee_id,
t.project_id,
t.status
having
COUNT(*) > 1
union all
SELECT
t.employee_id,
t.project_id,
t.status,
a.status,
'criteria#2' as SQL
FROM
t
left join t as a on
t.employee_id = a.employee_id and
t.project_id = a.project_id
where
t.status != a.status and
concat(t.status,a.status) != 'REQUIREDREJECTED' and
concat(t.status,a.status) != 'REJECTEDREQUIRED'
CodePudding user response:
Try the following:
select T.emp_id, T.proj_id, T.status, D.dup_cnt
from employee_projects T join
(
select emp_id, proj_id, count(*) as dup_cnt
from employee_projects
group by emp_id, proj_id
having count(*) > 1 and
count(distinct case when status in ('REQUIRED', 'REJECTED') then status end) < 2
) D
on T.emp_id = D.emp_id and T.proj_id = D.proj_id
order by T.emp_id, T.proj_id
If you want to consider an employee with statuses ('REQUIRED', 'REJECTED', any other statuses) as duplicate, modify the having clause as the following:
select T.emp_id, T.proj_id, T.status, D.dup_cnt
from employee_projects T join
(
select emp_id, proj_id, count(*) as dup_cnt
from employee_projects
group by emp_id, proj_id
having count(*) > 1 and
(count(distinct case when status in ('REQUIRED', 'REJECTED') then status end) < 2 or count(distinct status) > 2)
) D
on T.emp_id = D.emp_id and T.proj_id = D.proj_id
order by T.emp_id, T.proj_id
See a demo.