The table stores time-series data and contains approx 15 columns.
I want to optimize the SELECT
query which has filters on 3 columns
SELECT * FROM TABLE_1
WHERE COL_1 = ?
AND COL_2 = ?
AND COL_3 = ?
There are 2 indexes created on COL_1
and COL_2
but not on COL_3
. DB CPU is spiking to 100% when RPS (Request per Second) is around 1K.
DB Config is
- 8 CPUs
- 16 GB RAM
- 250 GB Storage
Is query processing more as there is no indexing on COL_3
? Is it a good practice to create an index on each column used in the WHERE
clause?
CodePudding user response:
Is it good practice to create index on all columns used in WHERE clause?
It is not good practice to create single-column indexes on all columns mentioned in WHERE clauses. Those indexes don't help your queries much, and they cost time and IO when you do INSERT and UPDATE operations.
It is good practice to create multicolumn indexes that match the WHERE clauses of your high-volume queries.
Your sample query
SELECT * FROM TABLE_1
WHERE COL_1 = ?
AND COL_2 = ?
AND COL_3 = ?
will benefit from a BTREE index on (COL_1, COL_2, COL_3)
. postgresql can random-access the index to the first row in your table that matches your WHERE clause, then retrieve the rows by scanning the index.
If your query were
SELECT * FROM TABLE_1
WHERE COL_1 = ?
AND COL_TIME >= ?
AND COL_3 = ?
you would want an index on (COL_1, COL_3, COL_TIME)
. Again, postgresql can random-access the index to the first eligible row, then scan the index sequentially until it gets to the last eligible row. Put the equality-match columns first in the index, then the range match (COL_TIME >= ?
) column.
Design your indexes to match
- your database requirements for primary keys and other constraints.
- your high-volume queries.
Just putting single-column indexes on many columns is a n00b mistake. Ask me how I know this sometime. ;-)
Indexing can seem arcane when you first start working with it. Marcus Winand's book https://use-the-index-luke.com/ is a good place to start learning.
CodePudding user response:
Indexing a table façing a query is not so easy that @OJones speaks... Because the WHERE clause is not the only SQL clause concerned in indexing the table... !
In fact all clauses of the query that are relative to the table you want to speed with indexes, must be analyzed.
As an example, the fact that you use a:
SELECT *
...into your query, does not help to be fast enough with an index containing only the columns used in the where clause. Very often, this index will not be used because to report all the values for all columns of the table (due to the SELECT *) the optimizer ("planer" as they say in PG) needs to do a seek into the index, and then do another access into the table to catch all columns that are not in the index definition...
Now you have the choice to create an index with all the columns of the table (and in this case the use of recently added INCLUDE clause like MS SQL Server does since 15 years, will help you to have a not too big index), or to reduce the returned columns listed in the SELECT clause and create a covering index...
A covering index is one that does not needs to access twice the table because it contains all the required columns of the query
In a french paper that you can read at :
I classify indexes with a "star" quotation:
- 1 star is an index that just cover the WHERE clause of a SELECT
- 2 stars is an index that cover WHERE and SELECT clause
- 3 stars is an index that cover WHERE, SELECT and ORDER BY clause
- 4 stars is an index that cover WHERE, SELECT, ORDER BY and GROUP BY clause