Home > Mobile >  How to get correct SUM() when joining table with duplicated row data?
How to get correct SUM() when joining table with duplicated row data?

Time:10-11

Currently I have 3 tables, the first table 'Users' contains id and user_name. The second table 'listings' contains refno and agent_id. And my third table 'logs' contains refno and status. Now I want to display the name of a person next to their status. So basically I want the count of status entries from logs and put their respective username next to it.

To do this, I have to reference refno of 'logs' to refno of 'listings' and the agent_id of 'listings' to id of 'Users'. For this I have used the following statement:

select SUM(CASE WHEN status = 'Draft' THEN 1 END) AS draft,
       SUM(CASE WHEN status = 'Publish' THEN 1 END) AS publish, 
       u.name
from logs t 
inner join listings l on t.refno = l.refno 
inner join users u on l.agent_id=u.id

But this returns an output like:

enter image description here

Which is wrong, the output I want is like this:

Draft Publish Name
1 1 Jason
0 1 Jam

I've added a sqlfiddle with data to make the reference easier to understand: http://sqlfiddle.com/#!9/22b6e4/5

CodePudding user response:

Update:
Based upon your comments below, you need to first create pseudo table inside the FROM clause that must contain all the necessary data in order to get your desired results.
The below subquery creates a pseudo table that contains all the required data.

SELECT u.id,
       u.name,
       t.status,
       t.refno
FROM logs t 
INNER JOIN listings l ON t.refno = l.refno 
INNER JOIN users u ON l.agent_id = u.id
GROUP BY t.refno, u.name, t.status;

You simply need to wrap the above query as a subquery inside the FROM clause of your original query.
So here is the final query that gets your desired output.

SELECT SUM(CASE WHEN tab.status = 'Draft' THEN 1 ELSE 0 END) AS draft,
       SUM(CASE WHEN tab.status = 'Publish' THEN 1 ELSE 0 END) AS publish,
       tab.name
FROM (SELECT u.id,
      u.name,
      t.status,
      t.refno
      FROM logs t 
      INNER JOIN listings l ON t.refno = l.refno 
      INNER JOIN users u ON l.agent_id = u.id
      GROUP BY t.refno, u.name, t.status) AS tab
GROUP BY tab.name
ORDER BY tab.id;

Original Answer:

You need to add a GROUP BY clause to group the result according to your desired parameter(s).
Here you can either group by l.agent_id or u.id.

Another thing that i noticed is that you need to add an ELSE clause to your SUM statement to return 0 in case an unexpected status is returned from the query.
Something like this:
SUM(CASE WHEN status = 'Publish' THEN 1 ELSE 0 END)

So your final query becomes something like this:

SELECT SUM(CASE WHEN status = 'Draft' THEN 1 ELSE 0 END) AS draft,
       SUM(CASE WHEN status = 'Publish' THEN 1 ELSE 0 END) AS publish, 
       u.name 
FROM logs t 
INNER JOIN listings l ON t.refno = l.refno 
INNER JOIN users u ON l.agent_id=u.id
GROUP BY u.id;

CodePudding user response:

The glaring problem to overcome is the fact that you have non-unique data in your listings table -- this is skewing your sums.

You need to join only on unique rows so that you don't count a subsequently joined row more than once.

SELECT u.id,
       u.name,
       SUM(status = 'Draft') AS draft,
       SUM(status = 'Publish') AS publish
FROM users AS u
JOIN (SELECT DISTINCT * FROM listings) AS l ON u.id = l.agent_id
JOIN logs AS t ON l.refno = t.refno
GROUP BY u.id

I prefer to include the id in the result set because names are often not unique.

http://sqlfiddle.com/#!9/22b6e4/48

  • Related