Optimize MySQL Query - single table with SubQuery-CodePudding

I have a single table with ~ 1.5 mio entries and this query:

SELECT id 
FROM abc.items as I 
where revisit < '2022-11-17T00:00:00Z' 
AND type = 10 
AND firstdelivery = (
                        select min(firstdelivery) 
                        from abc.items 
                        WHERE I.polizzennummer = polizzennummer
                    ) 
LIMIT 50

Which takes about 120-180 seconds to run. The revisit is the part which changes the most, this is a query which runs every day.

Table schema:

'id','int(10) unsigned','NO','PRI',NULL,'auto_increment'
'polizzennummer','varchar(64)','YES','MUL',NULL,''
'type','tinyint(4)','YES','MUL',NULL,''
'revisit','datetime','YES','MUL',NULL,''
'status','tinyint(4)','YES','',NULL,''
'ts','datetime','YES','',NULL,''
'grund','text','YES','',NULL,''
'firstdelivery','datetime','YES','MUL',NULL,''

Explain Output:

'1', 'PRIMARY', 'I', NULL, 'ref', 'idx_abc_type,idx_abc_revisit,idx_abc_type_revisit_firstdelivery', 'idx_abc_type', '2', 'const', '499036', '50.00', 'Using where'
'2', 'DEPENDENT SUBQUERY', 'abc', NULL, 'ref', 'polizzennummer,idx_abc_polizzennummer_firstdelivery', 'idx_abc_polizzennummer_firstdelivery', '195', 'abc.I.polizzennummer', '7', '100.00', 'Using index'

The table index output:

'abc', '0', 'PRIMARY', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '0', 'id_UNIQUE', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '1', 'polizzennummer', '1', 'polizzennummer', 'A', '117288', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_revisit', '1', 'revisit', 'A', '93553', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_firstdelivery', '1', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '1', 'polizzennummer', 'A', '125237', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '2', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '2', 'revisit', 'A', '116078', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '3', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''

Im not sure how i could optimize this query further.

Any help or suggestion is appreciated

CodePudding user response：

I can't really figure out your indices from what you included in the question. Here are the indices I would suggest for this query. For the correlated subquery, use:

CREATE INDEX idx1 ON table1.items (polizzennummer, firstdelivery);

This index should allow for rapid evaluation given each row in the outer query. For the outer query itself, you may try:

CREATE INDEX idx2 ON abc.items (revisit, type, firstdelivery);

This covers the WHERE clause of the outer query.

Note that your current query is using LIMIT without ORDER BY, which does not make much sense. You should add an ORDER BY clause to the outer query.

CodePudding user response：

You want the first delivery and only show it if its type is 10 and its revisit date is before 2022-11-17. But in order to do this you select all candidates, and for each you select all rows with the same polizzennummer. This can mean that you run the same subquery again and again.

Three approaches to help here:

Idea #1: Get the minimum firstdelivery once per polizzennummer

SELECT id 
FROM abc.items
WHERE revisit < '2022-11-17T00:00:00Z' 
AND type = 10
AND (polizzennummer, firstdelivery) IN
(
  SELECT polizzennummer, MIN(firstdelivery)
  FROM abc.items
  GROUP BY polizzennummer
);

Idea #2: Stop looking when you find an earlier firstdelivery

SELECT i.id 
FROM abc.items i 
WHERE i.revisit < '2022-11-17T00:00:00Z' 
AND i.type = 10
AND NOT EXISTS
(
  SELECT null
  FROM abc.items i2
  WHERE i2.polizzennummer = i.polizzennummer
  AND i2.firstdelivery < i.firstdelivery
);

Idea #3: Find the minimum firstdelivery for each row and continue working on that dataset

SELECT *
FROM
(
  SELECT i.*, MIN(firstdelivery) OVER (PARTITION BY polizzennummer) AS min_firstdelivery
  FROM abc.items i 
) with_min_fd
WHERE firstdelivery = min_firstdelivery
AND revisit < '2022-11-17T00:00:00Z' 
AND type = 10;

As to indexes: Your index idx_abc_polizzennummer_firstdelivery on abc.items(polizzennummer, firstdelivery) should suffice for above queries.