I am having trouble understanding why my two SQL queries output different results for the count of senior managers when I expected them to be the same.
SELECT COMPANY.Company_Code, COMPANY.Founder, COUNT(SENIOR_MANAGER.Senior_Manager_Code)
FROM COMPANY INNER JOIN SENIOR_MANAGER
ON SENIOR_MANAGER.COMPANY_CODE = COMPANY.COMPANY_CODE
GROUP BY COMPANY.COMPANY_CODE, COMPANY.FOUNDER;
This SQL query is trying to find out how many senior managers there are in different companies, here are the results:
C1 Angela 5
C10 Earl 2
C100 Aaron 4
C11 Robert 1
C12 Amy 6
However, when I use the same condition in another query where I use two inner joins I get another set of results. Here is my query:
SELECT COMPANY.COMPANY_CODE, COMPANY.FOUNDER, COUNT(LEAD_MANAGER.LEAD_MANAGER_CODE),
COUNT(SENIOR_MANAGER.SENIOR_MANAGER_CODE)
FROM COMPANY INNER JOIN LEAD_MANAGER
ON COMPANY.COMPANY_CODE = LEAD_MANAGER.COMPANY_CODE
INNER JOIN SENIOR_MANAGER
ON SENIOR_MANAGER.COMPANY_CODE = COMPANY.COMPANY_CODE
GROUP BY COMPANY.COMPANY_CODE, COMPANY.FOUNDER;
Here are the results:
C1 Angela 10 10
C10 Earl 2 2
C100 Aaron 8 8
C11 Robert 1 1
C12 Amy 12 12
The fourth column is the count of the senior managers & has the same values as the third column for some reason but has different values from the first query I showed. Can someone explain why the results are different I suspect it could be because I am using the inner joins incorrectly?
The desired result I expect is for the fourth column to have the values from the first query shown.
CodePudding user response:
Your question is missing information on the data (which is probably why you received a downvote), but the answer seems reasonably clear.
Each time you join a new table, each row in the result set is multiplied by the number of times the join condition is matched in the new table for that row.
So your results show that each of C1, C100 and C12 has two LEAD_MANAGERs.
CodePudding user response:
Since counts of your two-join query reflects multiples of the one-join query, you are double counting the previously distinct pairings with new join.
Specifically, in first query COMPANY
and SENIOR_MANAGER
tables can have a one-to-many relationship where a distinct company can have multiple senior managers.
However, in second query when including LEAD_MANAGER
table which would have even more distinct pairings to COMPANY
, you repeat SENIOR_MANAGER
multiple times for every corresponding pairing of COMPANY
and LEAD_MANAGER
.
As a solution to the double counting, consider joining using aggregate subqueries (or CTEs):
SELECT agg_s.COMPANY_CODE,
agg_s.FOUNDER,
agg_s.SENIOR_MANAGER_COUNT.
agg_l.LEAD_MANAGER_COUNT
FROM
(SELECT c.COMPANY_CODE,
c.FOUNDER,
COUNT(s.SENIOR_MANAGER_CODE) AS SENIOR_MANAGER_COUNT
FROM COMPANY c
INNER JOIN SENIOR_MANAGER s
ON s.COMPANY_CODE = c.COMPANY_CODE
GROUP BY c.COMPANY_CODE,
c.FOUNDER
) AS agg_s
INNER JOIN
(SELECT c.COMPANY_CODE,
c.FOUNDER,
COUNT(l.LEAD_MANAGER_CODE) AS LEAD_MANAGER_COUNT
FROM COMPANY c
INNER JOIN LEAD_MANAGER l
ON c.COMPANY_CODE = l.COMPANY_CODE
GROUP BY c.COMPANY_CODE,
c.FOUNDER
) AS agg_l
ON agg_s.COMPANY_CODE = agg_l.COMPANY_CODE
AND agg_s.FOUNDER = agg_l.FOUNDER
Even better, consider a table design change for proper normalization where you maintain a single MANAGER
table with TYPE
indicator column for SENIOR
or LEAD
. Then run a single query with conditional aggregation:
SELECT c.COMPANY_CODE,
c.FOUNDER,
COUNT(CASE WHEN m.TYPE = 'SENIOR' THEN 1 ELSE NULL END) AS SENIOR_MANAGER_COUNT,
COUNT(CASE WHEN m.TYPE = 'LEAD' THEN 1 ELSE NULL END) AS LEAD_MANAGER_COUNT
FROM COMPANY c
INNER JOIN MANAGER m
ON c.COMPANY_CODE = m.COMPANY_CODE
GROUP BY c.COMPANY_CODE,
c.FOUNDER;