MySql view with group_by very slow-CodePudding

I have two tables, appointments and clients. The tables are big, each with about 100M records.

each has a primary key on id
clients has a foreign key on appointment_id

I'm using mysql 5.7

As one would expect, the following query is lightning fast:

select a.id, count(a.id) client_count
from appointments a
left join clients c on c.appointment_id = a.id
where a.id = 499
group by a.id;

But, if I create the following view and query, it is interminably slow:

create view client_counts as
select a.id, count(c.id) client_count
from appointments a
left join clients c on c.appointment_id = a.id
group by a.id;

select id, client_count
from client_counts
where id = 499;

I'm assuming the where clause is not being applied to the inner query (in the view), but rather every record of the view must be scanned to see if 499 is a match. I think this is caused by the group by.

Note: I did see that I can use a function in the where clause of the view, but it seems very clunky. Is this the preferred method?

My Question: What's the best way I can continue to use a view and have it be fast?

CodePudding user response：

Your first query does not really use c. So, I don't consider it worth discussing.
COUNT(x) checks each x for being not NULL. Usually people use COUNT(*) to count "rows".
VIEWs are syntactic sugar; I have yet to see an example where a VIEW runs faster than the equivalent SELECT. I have seen many cases where the VIEW seems to run significantly slower.
Would you like to discuss how to write the optimal SELECT to achieve some 'count'?

I think your View Select boil down to simply

SELECT COUNT(*) FROM clients WHERE appointment_id = 499;

That query will be lightning fast if clients has

INDEX(appointment_id)

(Sorry, MySQL is not smart enough to deduce this for you.)

CodePudding user response：

I think the best approach is to use a function; if you were using Postgres, maybe using a materialized view could solve your problem if the records don't change much... but in general I think you should use a function here, or just do a query instead of using a view.

Anyway, it's a good idea to check if the id fields are indexed.

CodePudding user response：

It seems, the optimizer in MySQL is not good enough yet to fully understand the query and apply the condition early. You are right, this is probably due to aggregation. What the DBMS sees is this:

select id, client_count
from
(
  select a.id, count(c.id) client_count
  from appointments a
  left join clients c on c.appointment_id = a.id
  group by a.id;
) client_counts
where id = 499;

which it can reduce to

select a.id, count(c.id) client_count
from appointments a
left join clients c on c.appointment_id = a.id
group by a.id
having a.id = 499;

but it doesn't see that the condition can be moved before the aggregation.

I am pretty sure you can workaround this by not aggregating over the joined rows, but over clients only. This is done by moving the aggregation into a subquery and place it in the select clause. (MySQL doesn't feature lateral joins yet, otherwise you'd have the choice to put the correlated subquery in the from clause instead.)

create view client_counts as
select
  a.id, 
  (
    select count(*)
    from clients c
    where c.appointment_id = a.id
  ) as client_count
from appointments a;