Home > Enterprise >  PostgreSQL subquery COUNT fails when the subquery is joined more than once
PostgreSQL subquery COUNT fails when the subquery is joined more than once

Time:04-11

So a little context, I have 2 tables:

Table "Class" where I store a bunch of classes.

id serial4   // -> This is the PK
name varchar(64)
code varchar(64)

And Table "ClassEvent", where I store events related to classes, such as "started" and "ended".

id serial4
class_id int4 NOT NULL  // -> This is a FK to the Class table
event_type varchar(1) NOT NULL  // -> Here we store 's' for started and 'e' for ended.

Now, I need a query to get the amount of times each class has been started and ended. This works:

select
    c.code,
    c.name,
    count(started.id) "started"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started 
    on started.klass_id = c.id
group by c.code, c.name
order by started desc;

But when I do exactly the same to get the amount of ended classes it shows incorrect amounts:

select
    c.code,
    c.name,
    count(started.id) "started",
    count(ended.id) "ended"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started 
    on started.klass_id = c.id
left join (select id, class_id, event_type from "class_event" where event_type = 'e') ended 
    on ended.klass_id = c.id
group by c.code, c.name
order by started desc;

Also, the query takes significantly more time to execute. Is there anything I'm missing?

CodePudding user response:

You can try to use condition aggregate function

select
    c.code,
    c.name,
    count(CASE WHEN event_type = 's' THEN ended.id END) "started",
    count(CASE WHEN event_type = 'e' THEN ended.id END) "ended"
from "class" c
left join "class_event" started 
    on started.class_id = c.id
group by c.code, c.name
order by started desc;

CodePudding user response:

While you query the whole table it's typically cleaner and faster to aggregate the n-table before you join. See:

SELECT * 
FROM   class c
LEFT   JOIN (
   SELECT class_id AS id
        , count(*) FILTER (WHERE event_type = 's') AS started
        , count(*) FILTER (WHERE event_type = 'e') AS ended
   FROM   class_event
   GROUP  BY 1
   ) e  USING (id)
ORDER  BY e.started DESC NULLS LAST;

NULLS LAST because it's conceivable that some of the classes have no related rows in table class_event (yet), and the resulting NULL values surely shouldn't sort on top. See:

About the aggregate FILTER clause:

Aside:

For just a hand full of allowed values, I would consider the data type "char" instead of varchar(1) for event_type. See:

  • Related