I need to find teacher_id who don't have the last highest teacher (the highest must be teacher_id 7 for everyone) to whom they report using PRESTO SQL. Hierarchy depth can be greater than shown in the example dataset.
Query result must have issued teacher_id: 8,9,10,11
OUTPUT:
teacher_full_name | teacher _id |
---|---|
Jane Ortego | 8 |
Michael Powers | 9 |
Sasha Li | 10 |
Diana Norris | 11 |
Example dataset:
teacher_full_name | teacher_id | reporter_teacher |
---|---|---|
Margo Holmes | 1 | 2 |
Carry Miles | 2 | 3 |
Philipp Baskov | 3 | 4 |
Harry Potter | 4 | 5 |
Daniel Lopez | 5 | 6 |
Ivan Petrov | 6 | 7 |
Jhon Doe | 7 | |
Jane Ortego | 8 | 9 |
Michael Powers | 9 | 10 |
Sasha Li | 10 | 9 |
Diana Norris | 11 | 10 |
Wolter White | 12 | 6 |
I used PostgreSQL to realize such recursive logic but I do not know how to realize it in Presto.
CodePudding user response:
AFAIK Presto does not support recursive queries, if you can migrate to Trino you can use WITH RECURSIVE
CTE clause. My take looks like this (probably can be optimized):
with recursive dataset(teacher_full_name, teacher_id, reporter_teacher) as(
values ('Margo Holmes', 1, 2),
('Carry Miles', 2, 3),
('Philipp Baskov', 3, 4),
('Harry Potter', 4, 5),
('Daniel Lopez', 5, 6),
('Ivan Petrov', 6, 7),
('Jhon Doe', 7, null),
('Jane Ortego', 8, 9),
('Michael Powers', 9, 10),
('Sasha Li', 10, 9),
('Diana Norris', 11, 10),
('Wolter White', 12, 6)
),
t(lvl, teacher_full_name, teacher_id, reporter_teacher, all_parents) as(
select 1 lvl, teacher_full_name, teacher_id, reporter_teacher, array[reporter_teacher] all_parents
from dataset
union all
select t.lvl 1,
t.teacher_full_name,
t.teacher_id,
d.reporter_teacher,
d.reporter_teacher || t.all_parents as all_parents
from dataset d
inner join t on t.reporter_teacher = d.teacher_id
where not contains(all_parents, d.reporter_teacher)
)
select arbitrary(t.teacher_full_name) full_name,
t.teacher_id
from t
group by t.teacher_id
having not contains(max_by(t.all_parents, t.lvl), 7)
order by teacher_id;
Output:
full_name | teacher_id |
---|---|
Jane Ortego | 8 |
Michael Powers | 9 |
Sasha Li | 10 |
Diana Norris | 11 |
CodePudding user response:
Unfortunately, Presto doesn't support recursive CTEs, so the best you can do is probably a chain of them. Something like this:
WITH R1 AS (
SELECT teacher_id
FROM @input
WHERE reporter_teacher IS NULL
),
R2 AS (
SELECT i.teacher_id
FROM @input i
INNER JOIN R1 ON i.reporter_teacher=R1.teacher_id
),
R3 AS (
SELECT i.teacher_id
FROM @input i
INNER JOIN R2 ON i.reporter_teacher=R2.teacher_id
)
--R4 AS ... referencing R3
--R5 AS ... referencing R4
SELECT * FROM @input WHERE teacher_id NOT IN (
SELECT * FROM R1
UNION SELECT * FROM R2
UNION SELECT * FROM R3
--UNION SELECT * FROM R4 etc
)
You'll need more CTEs in your expression since you've said that the hierarchy depth can be larger than shown in the dataset.
A quick explanation of how this works:
R1
selects all employees that don't report to anyone, so in the sample data, Jhon Doe
.
R2
selects all employees that report to employees in R1
. In the example, Ivan Petrov
because he reports to Jhon Doe
. And so on down the list. Recursing far enough, you'll eventually get all the employees that report to Jhon Doe
.
The last bit unions all the R_n
s, giving you a list of people to exclude from the final select.