Home > database >  PostgreSQL recursive select find root element from leaf
PostgreSQL recursive select find root element from leaf

Time:05-17

I am developing a database for a forum, with threads and messages. Threads start with a message with no parent_id; replies are messages with parent_id.

I have a table for the messages. Each item reference to items on same table, to relate them as parent-child.

create table messages(
  id int,
  title text,
  content text,
  parent_id int
);

Now I fill the table with some Data:

insert into messages values 
(1, 'One', 'First  thread main post', null), -- First thread
(2, 'Two', 'First thread reply', 1), 
(3, 'Three', 'First thread reply', 2),
(4, 'Four', 'Second thread main post', null), -- Second thread
(5, 'Five', 'Second thread reply', 4),
(6, 'Six', 'Second thread reply', 5),
(7, 'Six', 'Second thread reply', 6),
(8, 'Six', 'Second thread reply', 7),
(9, 'Six', 'Second thread reply', 8); 

Knowing the id of the first message of a thread, I can retrieve all replies with a recursive query:

with recursive my_tree as (
    select * from messages 
    where parent_id IS null and id = 4
  union all
    select messages.* from messages 
    join my_tree on messages.parent_id = my_tree.id
) select my_tree.* 
    from my_tree;

Here is a fiddle.

Now, lets say I know the id of some reply, and I want to get the id of the first message of the thread. How can I do it?

EDIT: I know I can retrieve all messages of a thread from the known id upwards:

with recursive my_tree as (
    select * from messages 
    where id = 9
  union all
    select messages.* from messages 
    join my_tree on messages.id = my_tree.parent_id
) select my_tree.* 
    from my_tree;

Here a fiddle. But if the thread consists of thousands of messages that would be less than efficient. Is there any way a recursive query can get to first item without retrieving all messages?

CodePudding user response:

With the current data structure, there is no other way than to visit the entire path from leaf to root. You can speed up this search a bit by operating only on key columns:

with recursive my_tree as (
    select id, parent_id
    from messages 
    where id = 9
union all
    select m.id, m.parent_id 
    from messages m
    join my_tree t on m.id = t.parent_id
) 
select m.*
from messages m
join my_tree t using(id)
where t.parent_id is null

db<>fiddle.

If this solution is not satisfactory, you can consider adding a column to the table

alter table messages add root_id int;

Filling this column with a recursive query will be easy and you will have immediate access to the root of every node.

An alternative is to change the data structure using the ltree extension.

CodePudding user response:

Your query is already retrieving the minimum number of rows. The only way to find the original parent is to retrieve a parent, then retrieve that parent,... And keep at it until you reach the ultimate parent.

One thing you might not know about recursive parts of recursive queries is that each step only works with the results from the prior step, not from the accumulated UNION result so far. This means that each cycle through your query example only retrieved one row, its immediate parent. It's hard to see how to make that any more efficient; six queries each returning one row.

Now, it sounds like you are not interested in retaining the intermediate rows, you just want the ID of the ultimate parent. If you post-process your recursive query result set with a second CTE, you can return just that one record. I added onto your original query like so:

with recursive my_tree as (
    select *,1 as depth from messages 
    where parent_id IS null and id = 4
 union all
    select messages.*, my_tree.depth 1 
    from messages 
    join my_tree on messages.parent_id = my_tree.id
), my_root as (select * from my_tree
     where depth=(select max(depth)
                  from my_tree)
   )
select * 
    from my_root;

This addition only returns the last row of the recursive query.

  • Related