For a tree like this, I'd like to choose from a given list of nodes only top nodes discarding those that are already contained in a parent node. Example Tree
- World
- Americas
- North America
- Usa
- Alabama
- New-York *
- Brooklyn
- Queens *
- Canada *
- Usa
- North America
- Europe *
- Germany
- Italy *
- France
- Paris *
- Americas
Given this list: New-York, Queens, Canada, Europe, Italy, Paris
The espected output is: New-York, Canada, Europe.
(Queens is discarded because is in New-York, Italy is discarded because is in Europe, Paris is discarded because is also in Europe, etc)
My tree is physicaly in an oracle table tree with records id, parent_id
What is the optimal algorithm to apply in this scenario? Avoiding unnecesary calls to DB.
CodePudding user response:
Clearly the worst case is that you have to traverse the whole tree, if for instance none of your search targets are in the tree, or they happen to be the last N leaves you visit.
So, any tree traversal algorithm should work, provided that as soon as you find one of your target values, you stop searching the subtree rooted at that node. That will avoid unnecessary searching of a subtree that cannot have any other top-level items than the root. In your example, the traversal might look like this:
World, North Americas, USA, Alabama, New York (this is a match; return and go back up the tree until there is an unvisited child somewhere), Canada (this is a match; return and go back up the tree until there is an unvisited child somewhere), Europe (this is a match; return and go back up the tree until there is an unvisited child somewhere).
CodePudding user response:
If the number of values in the list is relatively small to the number of the values in the table, then you might start from the records that correspond to the list values, and walk upwards in the tree until you either reach the root or reach another value in the list. Only retain those results where the upward search stops at the root.
I'll assume that the id
and parent_id
fields are the names of the countries (alternatively they could be unique numbers, with the name of the country in a third field). I'll also assume that there is a record with id
equal to 'World'
and a corresponding parent_id
that is NULL
.
Then you could do this with a single SQL query on the Oracle database:
SELECT id
FROM (
SELECT CONNECT_BY_ROOT id id,
parent_id
FROM tbl
START WITH id IN ('New-York', 'Queens', 'Canada', 'Europe', 'Italy', 'Paris')
CONNECT BY PRIOR parent_id = id
AND id NOT IN ('New-York', 'Queens', 'Canada', 'Europe', 'Italy', 'Paris')
)
WHERE parent_id IS NULL;
The list appears twice in the query, once to denote at which records to start, and once to indicate when the upward traversal should abort.
The final WHERE
clause ensures that the paths from lists reached the root, and did not abort at another value in the list.
The above query returns this result for the tree you depicted:
Canada
Europe
Queens
See it run on SQL Fiddle