Suppose I've got a table node
with two fields : id
, and child_id
:
id | name | child_id |
----------------------------
1 | node1 | NULL |
2 | node2 | 3 |
3 | node3 | NULL |
It means that "node1" has no parent and no children, "node2" has no parent and child "node2", and "node2" has parent "node2" and no children.
Now I want to select all "node" that have no parent.
In the example above I should get rows :
id | name | child_id |
----------------------------
1 | node1 | NULL |
2 | node2 | 3 |
How would you implement this query ?
CodePudding user response:
A traditional anti-join will produce the rows you want. For example:
select c.*
from node c
left join node p on c.id = p.child_id
where p.id is null
CodePudding user response:
I had to think about this, because it has the parent-child-relationships in opposite direction compared to traditional tree structures (where a node has a reference to its parent, not its child).
If you want all nodes without parents in your data structure, you want all nodes whose id is not used as a child id in other nodes. Right? (Because such other nodes would be the parents of that node.)
That could result in a query like this:
select *
from node
where id not in (select child_id
from node
where child_id is not null)
(Note that you have to be careful with three-valued logic in SQL, which could mess up a NOT IN clause. That's why I included an explicit WHERE-clause in the subquery so that that subquery only includes "valid" rows.)
By the way, the alternative query in the answer of The Impaler (which uses a join instead of a subquery) is quite fine as well. I'm not sure, but that query might even be (marginally) more efficient. However, I guess that a subquery expresses the intention of the main query more clearly, making it somewhat easier to read and to maintain. So I would personally stick with the subquery-query at first and move to the join-query only if that turns out to have a substantial and relevant performance benefit. But that's just my own humble opinion, of course. (And in such a case, I would keep the subquery-query as a comment in the code/script for documentation purposes.)