Home > Back-end >  How to avoid duplicated rows when joining multiple tables?
How to avoid duplicated rows when joining multiple tables?

Time:12-20

I am trying to SELECT the data from the education and experience tables. Both have two entries for the given candidate_id. When I try using GROUP BY and json_agg, I get four rows in the aggregated JSON values. What am I doing wrong? I want two education objects and two experience objects in their respective arrays.

SQL:

SELECT 
    json_agg(education) as education,
    json_agg(experience) as experience
FROM application
LEFT JOIN candidate ON application.candidate_id = candidate.id
LEFT JOIN education ON candidate.id = education.candidate_id
LEFT JOIN experience ON candidate.id = experience.candidate_id
WHERE application.candidate_id = 2
GROUP BY education.candidate_id, experience.candidate_id;

Result:

education
[{"id":3,"candidate_id":2,"school":"school1 candidate2","qualification":"qualification1 candidate2","dates":"dates1 candidate2","note":null}, 
 {"id":3,"candidate_id":2,"school":"school1 candidate2","qualification":"qualification1 candidate2","dates":"dates1 candidate2","note":null}, 
 {"id":4,"candidate_id":2,"school":"school2 candidate2","qualification":"qualification2 candidate2","dates":"dates2 candidate2","note":null}, 
 {"id":4,"candidate_id":2,"school":"school2 candidate2","qualification":"qualification2 candidate2","dates":"dates2 candidate2","note":null}]

experience
[{"id":3,"candidate_id":2,"employer":"emploer1 candidate2","title":"title1 candidate2","dates":"dates1 candidate2","job_duties":"duties1 candidate2"}, 
 {"id":4,"candidate_id":2,"employer":"emploer2 candidate2","title":"title2 candidate2","dates":"dates2 candidate2","job_duties":"duties2 candidate2"}, 
 {"id":3,"candidate_id":2,"employer":"emploer1 candidate2","title":"title1 candidate2","dates":"dates1 candidate2","job_duties":"duties1 candidate2"}, 
 {"id":4,"candidate_id":2,"employer":"emploer2 candidate2","title":"title2 candidate2","dates":"dates2 candidate2","job_duties":"duties2 candidate2"}]

I tried multiple variants of this query ...

CodePudding user response:

Multiple joins that do not (also) associate rows among the joined table rows effectively act like CROSS JOIN by proxy, multiplying rows. See:

Aggregate before joining (so that only a single row per parent row remains, hence no duplication). Or use lowly correlated subqueries for this simple case. Well, not even correlated for just your single candidate_id, rather plain suquery expressions in the SELECT list:

SELECT (SELECT json_agg(e.*)
        FROM   education e
        WHERE  e.candidate_id = 2)  AS education
     , (SELECT json_agg(e.*)
        FROM   experience e
        WHERE  e.candidate_id = 2)  AS experience
WHERE  EXISTS (SELECT FROM application a WHERE a.candidate_id = 2);

I removed the table candidate from your query, which was dead freight (unless you must verify that a related row exists in that table), but might additionally multiply rows in the same way.

And the table application only needs to be checked for the existence of any qualifying rows.

You might alternatively use (LATERAL) subqueries for more complex cases. (I suspect you over-simplified.) See:

  • Related