Home > other >  Postgresql: optimal use of multicolumn-index when subset of index is missing from the where clause
Postgresql: optimal use of multicolumn-index when subset of index is missing from the where clause

Time:02-26

I will be having queries on my database with where clauses similar to this:

SELECT * FROM table WHERE a = 'string_value' AND b = 'other_string_value' AND t > <timestamp>

and less often to this:

SELECT * FROM table WHERE a = 'string_value' AND t > <timestamp>

I have created a multicolumn index on a, b and t on that order. However I am not sure if it will be optimal for my second -less frequent- query.

Will this index do an index scan on b or skip it and move to the t index immediately? (To be honest Im not sure how index scans work exactly). Should I create a second multi-column index on a and t only for the second query?

The docs state that

'the index is most efficient when there are constraints on the leading (leftmost) columns'

But in the example it doesn't highlight my case where the 'b' equality column is missing in the where clause.

CodePudding user response:

The 2nd query will be much less effective with the btree index on (a,b,t) because the absence of b means t cannot be used efficiently (it can still be used as an in-index filter, but that is not nearly as good as being used as a start/stop point). An index on (a,t) will be able to support the 2nd query much more efficiently.

But that doesn't mean you have to create that index as well. Indexes take space and must be maintained, so are far from free. It might be better to just live with less-than-optimal plans for the 2nd query, since that query is used "less often". On the other hand, you did bother to post about it, so maybe "less often" is still pretty often. So you might be better off just to build the extra index and spend your time worrying about something else.

A btree index can be thought of like a phonebook, which is sorted on last name, then first name, then middle name. Your first query is like searching for "people named Mary Smith with a middle name less than Cathy" You can use binary search to efficiently find the first "Mary Smith", then you scan through those until the middle name is > 'Cathy', and you are done. Compare that to "people surnamed Smith with a middle name less than Cathy". Now you have to scan all the Smith's. You can't stop at the first middle name > Cathy, because any change in first name resets the order of the middle names.

Given that b only has 10 distinct values, you could conceivably use the (a,b,t) index in a skip scan quite efficiently. But PostgreSQL doen't yet implement skip scans natively. You can emulate them, but that is fragile, ugly, a lot of work, and easy to screw up. Nothing you said here makes me think it would be worthwhile to do.

  • Related