Home > database >  MySQL 'IN' operator on large number of values
MySQL 'IN' operator on large number of values

Time:05-25

I am observing weird behaviour which I am trying to understand.

MySQL version: 5.7.33 I have the below query:

select * from a_table where time>='2022-05-10' and guid in (102,512,11,35,623,6,21,673);

a_table has primary key on time,guid and index on guid
The query I wrote above has very good performance and as per explain plan is using index condition; using where; using MRR

As I increase the number of value in my in clause, the performance is impacted significantly.

After some dry runs, I was able to get a rough number. For less than ~14500 values the explain plan is same as above. For number of values higher than this, explain plan only uses where and it takes forever to run my query.

In other words, for example, if I put 14,000 values in my in clause, the explain plan has 14,000 rows as expected. However, if I put 15,000 values in my in clause, the explain has 221200324 rows. I dont even have these many rows in my whole table.

I am trying to understand this behaviour and to know if there is any way to fix this.

Thank you

CodePudding user response:

Read about Limiting Memory Use for Range Optimization.

When you have a large list of values in an IN() predicate, it uses more memory during the query optimization step. This was considered a problem in some cases, so recent versions of MySQL set a max memory limit (it's 8MB by default).

If the optimizer finds that it would need more memory than the limit, there is not another condition in your query it can use to optimize, it gives up trying to optimize, and resorts to a table-scan. I infer that your table statistics actually show that the table has ~221 million rows (though table statistics are inexact estimates).

I can't say I know the exact formula to know how much memory is needed for a given list of values, but given your observed behavior, we could guess that it's about 600 bytes per item on average, given that 14k items works and more than that does not work.

You can set range_optimizer_max_mem_size = 0 to disable the memory limit. This creates a risk of excessive memory use, but it avoids the optimizer "giving up." We set this value on all MySQL instances at my last job, because we couldn't educate the developers to avoid creating huge lists of values in their queries.

  • Related