A new social network site has the following data tables:
Table users:
id | name | sex |
---|---|---|
1 | Ann | null |
2 | Steve | m |
3 | Mary | f |
4 | Brenda | f |
Table friends:
user1 | user2 |
---|---|
1 | 2 |
1 | 3 |
2 | 3 |
Select data that will be returned by the following SQL query:
SELECT users.name,
COUNT(*) as count
FROM users LEFT JOIN
friends ON users.id = friends.user1 OR users.id = friends.user2
WHERE users.sex = 'f'
GROUP BY users.id,
users.name;
Output:
name | count |
---|---|
Ann | 1 |
Ann | 2 |
Steve | 1 |
Steve | 2 |
Mary | 1 |
Mary | 2 ** |
Brenda | 0 |
Brenda | 1 ** |
The asterisked sections are the correct answers, but I don't quite understand why (Brenda, 1) would be an answer here when (Mary, 2) is also a correct answer.
CodePudding user response:
This is a trap question. The query looks like your are selecting all female users (sex = 'f'
) and count their friendships. For this a left outer join on the friends table is applied, so we keep all female users, no matter whether they have friendships or not.
But then the query uses COUNT(*)
, which counts the joined rows. That is 2 for Mary and 1 for Brenda. If we wanted to count friendships, we'd have to count a column of the friends table, e.g. COUNT(friends.user1)
. Thus the outer joined rows having these columns set to null would be omitted from the count, and we'd get 2 for Mary and 0 for Brenda.
CodePudding user response:
In order to address this problem and avoid the Brenda = 1
situation, you could:
- generate a single "friends" table, by putting together (user1, user2) couples with (user2, user1) couples
- joining this table with your "users" table by matching on "users.id" "cte.user1"
- aggregate with a
COUNT
function on "cte.user1" field, grouping on the user name.
WITH cte AS (
SELECT user1, user2 FROM friends
UNION ALL
SELECT user2, user1 FROM friends
)
SELECT users.name,
COUNT(cte.user1) AS cnt
FROM users
LEFT JOIN cte
ON users.id = cte.user1
GROUP BY users.name