I have a database table (let's call it project
) and many other tables, most of these other table have a foreign key (id_project
) referencing the table project
.
The goal of this query is to return which phase the project is in right now (a project develops little by little until it reaches its end) and there are over 20 tables that a project may pass by, my solution to this was using too many joins and see what table has null values like this
SELECT
p.id_project
CASE
WHEN po.id IS NOT NULL THEN 'payment completed'
WHEN b.id IS NOT NULL THEN 'bill received'
WHEN e.id IS NOT NULL THEN 'project engaged'
--(and still many other cases)
ELSE 'start of the project'
END AS progress
FROM
project p
LEFT JOIN
decision d ON d.id_project = p.id_project
LEFT JOIN
engagement e ON e.id_project = p.id_project
LEFT JOIN
bill b ON b.id_project = p.id_project
LEFT JOIN
payment_order po ON po.id_project = p.id_project
LEFT JOIN
--..... (many other tables)
This query takes about 9 seconds at best to execute and it is used quite frequently (as a view called from other queries).
Is it possible to have another better solution or is this or another approach?
CodePudding user response:
Now about another approach? A project can be only in one phase at the moment; right? So, you could alter the PROJECT
table and add a new column - PROJECT_PHASE
- which would contain current phase. That column is to be updated as soon as project moves to another phase; the way I understood it, it is when a new row is created in any of those 20 tables.
Another option is to create a new table, project_phase
which would contain id_project
and `id_phase' combination (along with e.g. timestamp, whatever).
Any approach would mean that you'd quickly fetch current project phase, without outer joining 20 (large?) tables which takes time.
CodePudding user response:
We don't know your data, but your database design shows a 1:n relation for all tables, i.e. multiple decisions for one project, multiple engagements for one project, multiple bills, etc. Now let's assume there are three decisions, three engagements and four bills so far for a project. You join all rows on the project ID alone. This is called a cartesian product per project, creating all combinations (each row with each other row) producing 3 x 3 x 4 = 36 rows for this one project alone.
I am surprised you haven't noticed this yourself, as you say you are already using the query and there is no aggregation taking place. Or is this what you refer to with "too many joins"?
Instead of cross joining all those rows just look the tables up with EXISTS
or IN
.
SELECT
p.id_project,
CASE
WHEN p.id_project IN (SELECT po.id_project FROM payment_order po) THEN 'payment completed'
WHEN p.id_project IN (SELECT b.id_project FROM bill b) THEN 'bill received'
WHEN p.id_project IN (SELECT e.id_project FROM engagement e) THEN 'project engaged'
-- (and still many other cases)
ELSE 'start of the project'
END AS progress
FROM project p;
A faster alternative would be to store the status in the project table as suggested by Littlefoot (and ideally a table for the statuses) and then have triggers on all those tables that update that status.