Home > database >  SUM CASE when DISTINCT?
SUM CASE when DISTINCT?

Time:11-05

Joining two tables and grouping, we're trying to get the sum of a user's value but only include a user's value once if that user is represented in a grouping multiple times.

Some sample tables:

user table:

| id | net_worth |
------------------
| 1  | 100       |
| 2  | 1000      |

visit table:

| id | location   | user_id |
-----------------------------
| 1  |  mcdonalds | 1       |
| 2  |  mcdonalds | 1       |
| 3  |  mcdonalds | 2       |
| 4  |  subway    | 1       |

We want to find the total net worth of users visiting each location. User 1 visited McDonalds twice, but we don't want to double count their net worth. Ideally we can use a SUM but only add in the net worth value if that user hasn't already been counted for at that location. Something like this:

-- NOTE: Hypothetical query
SELECT
  location,
  SUM(CASE WHEN DISTINCT user.id then user.net_worth ELSE 0 END) as total_net_worth
FROM visit
  JOIN user on user.id = visit.user_id
GROUP BY 1;

The ideal output being:

| location  | total_net_worth |
-------------------------------
| mcdonalds | 1100            |
| subway    | 100             |

This particular database is Redshift/PostgreSQL, but it would be interesting if there is a generic SQL solution. Is something like the above possible?

CodePudding user response:

You don't want to consider duplicate entries in the visits table. So, select distinct rows from the table instead.

SELECT
  v.location,
  SUM(u.net_worth) as total_net_worth
FROM (SELECT DISTINCT location, user_id FROM visit) v
JOIN user u on u.id = v.user_id
GROUP BY v.location
ORDER BY v.location;

CodePudding user response:

You can use a window function to get the unique users, then join that to the user table:

select v.location, sum(u.net_worth)
from "user" u
  join (
    select location, user_id, 
           row_number() over (partition by location, user_id) as rn
    from visit
    order by user_id, location, id
  ) v on v.user_id = u.id and v.rn = 1
group by v.location;

The above is standard ANSI SQL, in Postgres this can also be expressed using distinct on ()

select v.location, sum(u.net_worth)
from "user" u
  join (
    select distinct on (user_id, location) *
    from visit
    order by user_id, location, id
  ) v on v.user_id = u.id
group by v.location;

CodePudding user response:

You can join the user table with distinct values of location & user id combination like the below generic SQL.

SELECT v.location, SUM(u.net_worth)
FROM (SELECT location, user_id FROM visit GROUP BY location, user_id) v
  JOIN user u on u.id = v.user_id
GROUP BY v.location;
  • Related