Home > Software engineering >  Why doesn't DISTINCT remove duplicates?
Why doesn't DISTINCT remove duplicates?

Time:10-22

I'm currently trying to run the below query.

SELECT DISTINCT friend_id, hobbies as friend_hobby
                FROM all_friends
                WHERE friend_id NOT IN 
                     (SELECT distinct(friend_id)
                      FROM all_friends
                      WHERE (friend_type = 'Good' OR friend_type = 'Great')
                      );

Expected Result

friend_id | friend_hobby
--------------------
123       | Basketball
--------------------
345       | Painting 

Actual Result

friend_id | friend_hobby
--------------------
123       | Basketball
--------------------
123       | Social Media
--------------------
123       | Reading
--------------------
123       | Hiking
--------------------
123       | Cooking
--------------------
345       | Painting 
--------------------
345       | Hiking 
--------------------
345       | Writing
--------------------
345       | Movies
--------------------
345       | Coding

Can anyone explain why the distinct clause doesn't return my Expected Result?

Is it because I'm also fetching friend_hobby, as well?

For note, I need to select both friend_id and hobbies, because I need to UNION with a Select statement fetching friend_id and hobbies. The hobbies in this select statement doesn't really matter, because they'll be replaced at some point.

What is the solution? I know adding group by at the end of the statement will remove the duplicate friend ids, but I'd prefer not to use group by without an aggregation occurring.

Thanks!

CodePudding user response:

The issue is that DISTINCT is based on the "group" that you define

SELECT DISTINCT friend_id, hobbies as friend_hobby

In the line above, you're saying to give you all distinct pairs of friend_id and hobbies (friend_hobby).

However, your expected results seems to be more like: fetch me the first tuple that has a distinct id (if that sentence makes sense at all).

In this case, I believe doing something like the following post should help with what you want: https://stackoverflow.com/a/42345019/5989233

CodePudding user response:

DISTINCT itself doesnt remove duplicated, you need group by as well

SELECT DISTINCT friend_id, friend_hobby from {whatever} group by friend_id 
  • Related