How to debug a SQL natural join resulting in additional rows?-CodePudding

I am struggling to debug a SQL join. I could brute-force it, but I am sure there is tooling for this.

I have 5 tables of that should "just" left join into each other on the common column names.

Something like

SELECT * 
FROM users u -- user_id's are unique
JOIN user_groups ug ON ug.user_id = u.user_id 
JOIN group_importance gi ON gi.group_id = ug.group_id

However, I'm finding about 10% too many rows in the final output (which I expected to be of the same length as users)

I implemented the above using

SELECT * 
FROM users 
NATURAL JOIN user_groups
NATURAL JOIN group_importance

But I can't easily tell the error.

Note that I've checked for duplicate joining keys, and for spurious column overlaps.

Note that I'm using Snowflake/DBT.

CodePudding user response：

Either your user_groups table contains more than one entry for some users, or your group_importance table contains more than one entry for some groups, or both.

JOIN gives back all possible pairs of rows from the tables being joined. That is, it gives back all pairs of rows that match the explicit ON condition, or the implicit one generated by NATURAL JOIN. It's no surprise that FROM a NATURAL JOIN b sometimes yields more rows than are in a.

You could find the offending rows like this.

SELECT COUNT(*), user_id
  FROM user_groups
 GROUP BY user_id
HAVING COUNT(*) > 1
 ORDER BY COUNT(*) DESC

CodePudding user response：

Using natural joins make debugging a pain. I would run the query with explicit joins and add a qualify at the bottom to just look at the dupes. You can partition by additional columns to look at dupes for different groupings

qualify count(*) over (partition by u.user_id) > 1