Home > Enterprise >  SQLite create indexes for columns with WHERE and ORDER BY clauses
SQLite create indexes for columns with WHERE and ORDER BY clauses

Time:09-16

My function creates filtering query from following dictionary input with fixed ordered keys

input = {
    "A": ["string"],
    "B": [
          "1234",
          "4567"
    ],
    "C": ["string"],
    "D": ["string"]
}

Each keys represents column names of sqlite file and value is the values that want to filter. There is another column called idx for index in sqlite db file.

I initially created indexes for all combinations of columns, but it's been pointed out that it's way too much (see my previous question for reference if you need)

The filtering query usually follows like these:

  1. SELECT * FROM result_table WHERE idx >= 1900 AND (B in (1234)) LIMIT 100
  2. SELECT * FROM result_table WHERE idx >= 1900 AND (B in (1234,5678)) LIMIT 100
  3. SELECT * FROM result_table WHERE idx >= 1900 AND (B in (1234) and D in ('somestring')) LIMIT 100
  4. SELECT * FROM result_table WHERE idx >= 1900 AND (B in (1234,5678)) ORDER BY A ASC, ORDER BY D DESC LIMIT 100
  5. SELECT * FROM result_table WHERE idx >= 1900 AND (B in (1234,5678) and C in ('somestring2')) ORDER BY A ASC, ORDER BY C DESC LIMIT 100

and so on

so it's mix and match of single or multiple of WHERE and ORDER BY clauses

From the comments I got my previous question, I understood how I should create indexes when I don't use ORDER BY clause, but as I put ORDER BY, the query does not use any indexes I created.

The indexes I created are following:

(with/without idx)
(idx,) A, B, C, D

(idx,) A, C, D

(idx,) A, D

(idx,) B, C, D

(idx,) B, D

(idx,) C, D

(idx,) D

Which indexes I should create more to apply for query with combination of WHERE and ORDER BY clauses?

CodePudding user response:

Queries 1 and 2 can use indexes B,C,D or B,D. Since idx is constrained by an inequality, it can be used only if it is immediately at the right of B in the index, not before. So B,idx would be a better choice, but not idx,B.

Best index for query 3 would be B,D,idx (or D,B,idx). B,D should be the one chosen among the indexes you created.

To execute query 4, sqlite can use the same index used in queries 1 or 2 to speed up the WHERE clause, or it could use index A,D to speed up the ORDER BY clause. Sqlite will make the choice based of what it thinks is the better plan and this choice will be made based on how many rows are in the table, how much selective is the WHERE BY clause (what percentage of rows will have a B value of 1234 or 5678) and how much selective is the LIMIT clause (are 100 rows a lot less than the total rows?). To correctly make these assumption, sqlite must have gathered statistics on the content of the table using ANALYZE.

Similarly, query 5 will use index B,C,D (B,C,idx would be better) for the WHERE or index A,C,D for the ORDER BY

Citing sqlite docs:

SQLite uses a cost-based query planner. When there are two or more ways of solving the same query, SQLite tries to estimate the total amount of time needed to run the query using each plan, and then uses the plan with the lowest estimated cost. A cost is computed mostly from the estimated time, and so this case could go either way depending on the table size and what WHERE clause constraints were available, and so forth. But generally speaking, the indexed sort would probably be chosen, if for no other reason, because it does not need to accumulate the entire result set in temporary storage before sorting and thus uses much less temporary storage.

As for your question, the best index is the one that can be used to satisfy both WHERE and ORDER BY clause.

If there are much more than 100 records with a value of 1234 or 5678 for B, for query 4 the best one could be (A,D,B,idx). Otherwise, (B,A,D,idx) could be a better choice.

For query 5 the best one could be (B,C,A,idx), (C,B,A,idx), (A,C,B,idx) or some other strange combination of these columns.

My suggestion is to download latest sqlite.exe (the command line interface) and then use the .expert --sample 100 command followed by the SQL query on a separate line. The ".expert" command will propose indexes that might assist with those specific queries, were they present in the database (Reference: https://www.sqlite.org/cli.html#index_recommendations_sqlite_expert_).

  • Related