I'm trying to determine the best indexes for a table in PostgreSQL. I expect on the order of ~10b rows and ~10TB data.
The table has 5 main columns used for filtering and/or sorting
- Filtering: 3 Columns of binary data stored as bytea
- Filtering / sorting: 2 Columns of type integer
CREATE TABLE table (
filter_key_1 AS BYTEA, -- filtering
filter_key_2 AS BYTEA, -- filtering
filter_key_3 AS BYTEA, -- filtering
sort_key_1 AS INTEGER, -- filtering & sorting
sort_key_2 AS INTEGER -- filtering & sorting
)
Queries will be:
SELECT * FROM table WHERE filter_key_1 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_1 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
What are the ideal indexes for the table? How large will they get with ~10b rows? How much will they limit write throughput?
Edit
What if I want to add additional queries such as below. Would the indexes from above hold-up?
SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 AND filter_key_3 = $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
-- ...
IO requirements
The workload is heavy read, low write.
Read speed is important. Write speed is less important (can live with up-to 3 seconds per insert)
- Read:
- expecting on average 150 read queries/sec
- most queries pulling in 100 to 100,000 rows after WHERE and before LIMIT
- Write:
- expecting 1 write query/12sec, 0.08 queries/sec
- writing 500-1000 rows/query, 42-84 rows/sec
CodePudding user response:
Since you need to run these queries all the time, you will have to optimize them as much as possible. That would mean
CREATE INDEX ON tab (filter_key_1, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_2, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_3, sort_key_1, sort_key_2);
Together, these indexes should be substantially larger than your table.