Home > Blockchain >  Optimal PostgreSQL single/multicolumn indexes?
Optimal PostgreSQL single/multicolumn indexes?

Time:10-01

I'm trying to determine the best indexes for a table in PostgreSQL. I expect on the order of ~10b rows and ~10TB data.

The table has 5 main columns used for filtering and/or sorting

  • Filtering: 3 Columns of binary data stored as bytea
  • Filtering / sorting: 2 Columns of type integer
CREATE TABLE table (
  filter_key_1 AS BYTEA,    -- filtering
  filter_key_2 AS BYTEA,    -- filtering
  filter_key_3 AS BYTEA,    -- filtering
  sort_key_1   AS INTEGER,  -- filtering & sorting
  sort_key_2   AS INTEGER   -- filtering & sorting
)

Queries will be:

SELECT * FROM table WHERE filter_key_1 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;

SELECT * FROM table WHERE filter_key_1 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;

What are the ideal indexes for the table? How large will they get with ~10b rows? How much will they limit write throughput?

Edit

What if I want to add additional queries such as below. Would the indexes from above hold-up?

SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 AND filter_key_3 = $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
-- ...

IO requirements

The workload is heavy read, low write.

Read speed is important. Write speed is less important (can live with up-to 3 seconds per insert)

  • Read:
    • expecting on average 150 read queries/sec
    • most queries pulling in 100 to 100,000 rows after WHERE and before LIMIT
  • Write:
    • expecting 1 write query/12sec, 0.08 queries/sec
    • writing 500-1000 rows/query, 42-84 rows/sec

CodePudding user response:

Since you need to run these queries all the time, you will have to optimize them as much as possible. That would mean

CREATE INDEX ON tab (filter_key_1, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_2, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_3, sort_key_1, sort_key_2);

Together, these indexes should be substantially larger than your table.

  • Related