I'm trying to improve the performance of a query with this table:
CREATE TABLE frag(hash primary key, type, ulen, clen, offset, refs);
SELECT MAX(offset clen) AS 'EndPos', type AS 'Type' FROM frag GROUP BY type;
My query plan is as such:
sqlite> EXPLAIN QUERY PLAN SELECT MAX(offset clen) AS 'Offset', type AS 'Type' FROM frag GROUP BY type;
QUERY PLAN
|--SCAN TABLE frag
`--USE TEMP B-TREE FOR GROUP BY
After creating a new index, the query plan changes to this:
CREATE INDEX max_frag ON frag(type, offset clen DESC);
sqlite> EXPLAIN QUERY PLAN SELECT MAX(offset clen) AS 'EndPos', type AS 'Type' FROM frag GROUP BY type;
QUERY PLAN
`--SCAN TABLE frag USING INDEX max_frag
My issue is that my query has slowed from from taking 90s to 280s with this new index. My database is 5.6GB large, with 45.5 million rows. Why is a full table scan faster than using this index? .expert
is telling me that no new index can improve this
CodePudding user response:
I don't know if SQLite support function indices, but one alternative approach you could try here would be to store the sum offset clen
in the table as some new column, say total
. Then, you may add the following index to the table:
CREATE INDEX idx ON frag (type, total DESC);
Then, the above index should work with the following query:
SELECT MAX(total) AS EndPos, type AS Type FROM frag GROUP BY type;
CodePudding user response:
As you stated in your comment, expressions are supported in indices. However, "The SQLite query planner will consider using an index on an expression when the expression that is indexed appears in the WHERE clause or in the ORDER BY clause of a query".
Since you are using it in an aggregate function, that part of the index will not be used and only the type part of the index will be used. If type is a field with only a few different values across the table, and the values are scattered in all pages of the database, the result is that for each value of type, sqlite will have to read nearly all the pages of the table to get max(offset clen) and this has to be repeated for every value of type.
The solution is to add to your table a generated colum
total AS (offset clen) STORED
and then using this column in the index and query
CREATE INDEX max_frag ON frag(type, total DESC);
ANALYZE;
SELECT MAX(total) AS 'EndPos', type AS 'Type' FROM frag GROUP BY type;
The result will be that only the index will be read, without accessing the table data, and the query plan should be like:
SCAN TABLE frag USING COVERING INDEX max_frag