Home > front end >  INNER JOIN with aggregate functions in my SELECT
INNER JOIN with aggregate functions in my SELECT

Time:11-25

I'm trying to join a new column to my current query that uses aggregate functions. I create this column with a new query that also uses an aggregate function from a different table but I'm not sure if a JOIN will work for me since I need to join it to its respective row.

TABLE A (employees that are enrolled or were enrolled in a project)

ID DEPARTMENT ENROLLED PROJECT
1 MARKETING Yes ARQ
2 MARKETING Yes TC
3 MARKETING No ARQ
4 MARKETING No TC
5 FINANCE Yes ARQ
6 FINANCE Yes TC
7 FINANCE No ARQ
8 FINANCE Yes TC

This table has more departments and more projects, but I simplified.

TABLE B (relation with departments and employees)

ID DEPARTMENT TOTAL_EMPLOYEES
1 MARKETING 2
2 MARKETING 3
3 FINANCE 4
4 FINANCE 8

In my first query I was asked to achieve the following result - using only table A:

             (employees enrolled)               (employees not enrolled)     
DEPARTMENT ARQ_E TC_E TOTAL_ENROLLED ARQ_N TC_N TOTAL_NOT_ENROLLED TOTAL
MARKETING 1 1 2 1 1 2 4
FINANCE 1 1 2 1 1 2 4

Using the following query:

SELECT  tableA.department, 
        sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
        sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
        sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
        sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
        sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
        sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
        count (*) AS Total               
FROM    tableA
GROUP BY tableA.department;

My second query gets departments and their total employees from table B:

DEPARTMENT TOTAL_EMPLOYEES
MARKETING 5
FINANCE 12

Using the following query:

SELECT  tableB.department,
        sum(tableB.total_employees) AS TOTAL_EMPLOYEES
FROM tableB
GROUP BY tableB.department;

I need to add the column TOTAL_EMPLOYEES to my first query, next to TOTAL will be TOTAL_EMPLOYEES. But it has to be placed with its respective department row. I need this to compare this 2 columns and see how many employees were not assigned to any project.

This is my expected result.

                  (employees enrolled)               (employees not enrolled)     
DEPARTMENT ARQ_E TC_E TOTAL_ENROLLED ARQ_N TC_N TOTAL_NOT_ENROLLED TOTAL T_EMPL
MARKETING 1 1 2 1 1 2 4 5
FINANCE 1 1 2 1 1 2 4 12

I have tried to achieve this using the following query:

SELECT  tableA.department, 
        sum(case when enrolled  = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
        sum(case when enrolled  = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
        sum(case when enrolled  = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
        sum(case when enrolled  != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
        sum(case when enrolled  != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
        sum(case when enrolled  != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
        count (*) AS Total,
        sum (tableB.total_employees) AS T_EMPL               
FROM    tableA
JOIN    tableB
ON tableA.department = tableB.department
GROUP BY tableA.department;

But the numbers I get in my query are completely wrong since the JOINS repeat my rows and my SUMS duplicate.

I don't know if I really need to use a join or a subquery to place my sum(tableB.department) in its respective row.

I'm using PostgreSQL but since I'm using Standard 92 any SQL solution will help.

CodePudding user response:

Your main issue stemmed from inadvertently multiplying rows with the join, and has already been addressed. See:

But use the standard SQL aggregate FILTER clause. It's shorter, cleaner, and noticeably faster. See:

SELECT *
FROM  (
   SELECT department
        , count(*) FILTER (WHERE enrolled AND project = 'ARQ')     AS arq_e
        , count(*) FILTER (WHERE enrolled AND project = 'TC')      AS tc_e
        , count(*) FILTER (WHERE enrolled)                         AS total_enrolled
        , count(*) FILTER (WHERE NOT enrolled AND project = 'ARQ') AS arq_n
        , count(*) FILTER (WHERE NOT enrolled AND project = 'TC')  AS tc_n
        , count(*) FILTER (WHERE NOT enrolled)                     AS total_not_enrolled
        , count(*) AS total
   FROM   tableA a
   GROUP  BY 1
   ) a
LEFT JOIN  (  -- !
   SELECT department
        , sum(total_employees) AS total_employees
   FROM   tableB b
   GROUP  BY 1
   ) b USING (department);

enrolled should be a boolean column. Make it so if it isn't. Then you can use it directly. Smaller, faster, cleaner, shorter code.

I replaced the [INNER] JOIN with a LEFT [OUTER] JOIN on a suspicion. Typically, you want to keep all results, even if the same department is not found in the other table. Maybe even a FULL [OUTER] JOIN?

Also, USING (department) as join condition conveniently outputs that column only once, so we can make do with SELECT * in the outer SELECT.

Finally, subqueries are shorter and faster than CTEs. Not much since Postgres 12, but still. See:

CodePudding user response:

Join the results of the two queries, using sub-queries, don't join the tables.

That way you're joining 1 row of enrollment data per department to 1 row of employee data per department.

SELECT
  *
FROM
(
  SELECT  tableA.department, 
          sum(case when enrolled  = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
          sum(case when enrolled  = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
          sum(case when enrolled  = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
          sum(case when enrolled  != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
          sum(case when enrolled  != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
          sum(case when enrolled  != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
          count (*) AS Total
               
  FROM    tableA
  GROUP BY tableA.department
)
  AS enroll
INNER JOIN
(
  SELECT  tableB.department,
          sum(tableB.total_employees) AS Total_EMPLOYEES
  FROM tableB
  GROUP BY tableB.department
)
  AS employee
    ON employee.department = enroll.department

CodePudding user response:

As the join will multiply the summ, you can first sum the values and then join them

WITH CTE1 as (SELECT  tableA.department, 
        sum(case when enrolled  = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
        sum(case when enrolled  = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
        sum(case when enrolled  = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
        sum(case when enrolled  != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
        sum(case when enrolled  != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
        sum(case when enrolled  != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
        count (*) AS Total
               
FROM    tableA
GROUP BY tableA.department),
CTE2 as (SELECT  tableB.department,
        sum(tableB.total_employees) AS TOTAL_EMPLOYEES

FROM tableB
GROUP BY tableB.department)
SELECT
CTE1.department,    ARQ_E,  TC_E,   TOTAL_ENROLLED,     ARQ_N,  TC_N,   TOTAL_NOT_ENROLLED,     TOTAL,  T_EMPL,CTE2.TOTAL_EMPLOYEES
FROM CTE1 JOIN CTE2 ON CTE1.department = CTE2.department
  • Related