Best way to write a query with too many joins-CodePudding

I have a database table (let's call it project) and many other tables, most of these other table have a foreign key (id_project) referencing the table project.

The goal of this query is to return which phase the project is in right now (a project develops little by little until it reaches its end) and there are over 20 tables that a project may pass by, my solution to this was using too many joins and see what table has null values like this

SELECT
    p.id_project
    CASE
        WHEN po.id IS NOT NULL THEN 'payment completed'
        WHEN b.id IS NOT NULL THEN 'bill received'
        WHEN e.id IS NOT NULL THEN 'project engaged'
        --(and still many other cases)
       ELSE 'start of the project'
    END AS progress
FROM 
    project p
LEFT JOIN
    decision d ON d.id_project = p.id_project
LEFT JOIN 
    engagement e ON e.id_project = p.id_project
LEFT JOIN 
    bill b ON b.id_project = p.id_project
LEFT JOIN 
    payment_order po ON po.id_project = p.id_project
LEFT JOIN  
    --..... (many other tables)

This query takes about 9 seconds at best to execute and it is used quite frequently (as a view called from other queries).

Is it possible to have another better solution or is this or another approach?

CodePudding user response：

Now about another approach? A project can be only in one phase at the moment; right? So, you could alter the PROJECT table and add a new column - PROJECT_PHASE - which would contain current phase. That column is to be updated as soon as project moves to another phase; the way I understood it, it is when a new row is created in any of those 20 tables.

Another option is to create a new table, project_phase which would contain id_project and `id_phase' combination (along with e.g. timestamp, whatever).

Any approach would mean that you'd quickly fetch current project phase, without outer joining 20 (large?) tables which takes time.

CodePudding user response：

We don't know your data, but your database design shows a 1:n relation for all tables, i.e. multiple decisions for one project, multiple engagements for one project, multiple bills, etc. Now let's assume there are three decisions, three engagements and four bills so far for a project. You join all rows on the project ID alone. This is called a cartesian product per project, creating all combinations (each row with each other row) producing 3 x 3 x 4 = 36 rows for this one project alone.

I am surprised you haven't noticed this yourself, as you say you are already using the query and there is no aggregation taking place. Or is this what you refer to with "too many joins"?

Instead of cross joining all those rows just look the tables up with EXISTS or IN.

SELECT
  p.id_project,
  CASE
    WHEN p.id_project IN (SELECT po.id_project FROM payment_order po) THEN 'payment completed'
    WHEN p.id_project IN (SELECT b.id_project  FROM bill b)           THEN 'bill received'
    WHEN p.id_project IN (SELECT e.id_project  FROM engagement e)     THEN 'project engaged'
    -- (and still many other cases)
    ELSE 'start of the project'
  END AS progress
FROM project p;

A faster alternative would be to store the status in the project table as suggested by Littlefoot (and ideally a table for the statuses) and then have triggers on all those tables that update that status.