Home > database >  Which indexing algorithm is better for less varied but frequently occurring data in PostgreSQL
Which indexing algorithm is better for less varied but frequently occurring data in PostgreSQL

Time:03-16

Consider an enum of different (≈ 8-12) string values, that appears across 100 million rows in a PostgreSQL database. This column is used in a complex search query in conjunction with other conditions. Objectively speaking which indexing algorithm (amongst GiST and BTREE) would offer the most performance gains for this specific column?

CodePudding user response:

If I understand clearly you have about 8 to 12 different string values in a column into a 100 million row table.

If the distribution of these 8 to 12 values are equals that means that a filter for one of these values will return about 10 million rows and this is to much to use any index that will have only this column. You have to create indexes that involves all the columns included in :

  • first, the WHERE predicate for an equal search
  • second, the WHERE predicate for an inequal search (>, <...)
  • third, the ON of the JOIN operator
  • fourth, the GROUP BY or DISTINCT clauses
  • fifth, the ORDER BY if any

In that specific ordre

CodePudding user response:

If the distribution is even, and a WHERE condition on this column will reduce the result set by a factor of 8 to 12, then an index on the column may well make sense.

However, you should never think about creating an index just looking at the data in the table. The most important part to consider is the query that should become faster. Once you know the query, an answer can be much more definitive.

  • Related