Home > Enterprise >  recursive query presto SQL
recursive query presto SQL

Time:12-20

I need to find teacher_id who don't have the last highest teacher (the highest must be teacher_id 7 for everyone) to whom they report using PRESTO SQL. Hierarchy depth can be greater than shown in the example dataset.

Query result must have issued teacher_id: 8,9,10,11

OUTPUT:

teacher_full_name teacher _id
Jane Ortego 8
Michael Powers 9
Sasha Li 10
Diana Norris 11

Example dataset:

teacher_full_name teacher_id reporter_teacher
Margo Holmes 1 2
Carry Miles 2 3
Philipp Baskov 3 4
Harry Potter 4 5
Daniel Lopez 5 6
Ivan Petrov 6 7
Jhon Doe 7
Jane Ortego 8 9
Michael Powers 9 10
Sasha Li 10 9
Diana Norris 11 10
Wolter White 12 6

I used PostgreSQL to realize such recursive logic but I do not know how to realize it in Presto.

CodePudding user response:

AFAIK Presto does not support recursive queries, if you can migrate to Trino you can use WITH RECURSIVE CTE clause. My take looks like this (probably can be optimized):

with recursive dataset(teacher_full_name, teacher_id, reporter_teacher) as(
    values ('Margo Holmes', 1,  2),
        ('Carry Miles',     2,  3),
        ('Philipp Baskov',  3,  4),
        ('Harry Potter',    4,  5),
        ('Daniel Lopez',    5,  6),
        ('Ivan Petrov',     6,  7),
        ('Jhon Doe',        7,  null),
        ('Jane Ortego',     8,  9),
        ('Michael Powers',  9,  10),
        ('Sasha Li',        10, 9),
        ('Diana Norris',    11, 10),
        ('Wolter White',    12, 6)
),
t(lvl, teacher_full_name, teacher_id, reporter_teacher, all_parents) as(
    select 1 lvl, teacher_full_name, teacher_id, reporter_teacher, array[reporter_teacher] all_parents
    from dataset
    union all
    select t.lvl  1,
        t.teacher_full_name,
        t.teacher_id,
        d.reporter_teacher,
        d.reporter_teacher || t.all_parents as all_parents
    from dataset d
    inner join t on t.reporter_teacher = d.teacher_id
    where not contains(all_parents, d.reporter_teacher)
)

select arbitrary(t.teacher_full_name) full_name,
       t.teacher_id
from t
group by t.teacher_id
having not contains(max_by(t.all_parents, t.lvl), 7)
order by teacher_id;

Output:

full_name teacher_id
Jane Ortego 8
Michael Powers 9
Sasha Li 10
Diana Norris 11

CodePudding user response:

Unfortunately, Presto doesn't support recursive CTEs, so the best you can do is probably a chain of them. Something like this:

WITH R1 AS (
    SELECT teacher_id
    FROM @input
    WHERE reporter_teacher IS NULL
),
R2 AS (
    SELECT i.teacher_id
    FROM @input i
    INNER JOIN R1 ON i.reporter_teacher=R1.teacher_id
),
R3 AS (
    SELECT i.teacher_id
    FROM @input i
    INNER JOIN R2 ON i.reporter_teacher=R2.teacher_id
)

--R4 AS ... referencing R3
--R5 AS ... referencing R4

SELECT * FROM @input WHERE teacher_id NOT IN (
    SELECT * FROM R1
    UNION SELECT * FROM R2
    UNION SELECT * FROM R3
    --UNION SELECT * FROM R4 etc
)

You'll need more CTEs in your expression since you've said that the hierarchy depth can be larger than shown in the dataset.

A quick explanation of how this works:

R1 selects all employees that don't report to anyone, so in the sample data, Jhon Doe.

R2 selects all employees that report to employees in R1. In the example, Ivan Petrov because he reports to Jhon Doe. And so on down the list. Recursing far enough, you'll eventually get all the employees that report to Jhon Doe.

The last bit unions all the R_ns, giving you a list of people to exclude from the final select.

  • Related