I have two tables master and a child table. So for the question I will refer master as m and child table as c.
Master table properties:
Records: 50 million
primary key (index): m_id
btree index: modified_id
Child table properties (1-n relationship):
Records: 400 million
primary key (index): c_id
foreign key (btree index): m_id
Query plan
Gather (cost=9159.80..6768939.59 rows=18940 width=107) (actual time=137160.885..297009.782 rows=25 loops=1)
Output: m.<date_column>, m.m_id, m.other_unique_id_1, m.modified_id, c.<date_column>, c.c_id, c.m_id, c.<other_column_1>, c.<other_column_2>, c.<other_column_3>, c.<other_column_4>, c.<other_column_5>, c.<other_column_6>, c.<other_column_7>, c.<other_column_8>
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=11266 read=4680890
-> Hash Join (cost=8159.80..6766045.59 rows=7892 width=107) (actual time=87154.268..297003.331 rows=8 loops=3)
Output: c.<date_column>, c.m_id, c.other_unique_id_1, c.modified_id, c.<date_column>, c.c_id, c.m_id, c.<other_column_1>, c.<other_column_2>, c.<other_column_3>, c.<other_column_4>, c.<other_column_5>, c.<other_column_6>, c.<other_column_7>, c.<other_column_8>
Inner Unique: true
Hash Cond: (c.m_id = c.m_id)
Buffers: shared hit=11266 read=4680890
Worker 0: actual time=82162.730..297002.014 rows=13 loops=1
Buffers: shared hit=3789 read=1555017
Worker 1: actual time=42139.478..297002.514 rows=8 loops=1
Buffers: shared hit=3634 read=1569261
-> Parallel Seq Scan on child c (cost=0.00..6328357.20 rows=163629920 width=87) (actual time=0.638..279084.058 rows=130858840 loops=3)
Output: c.<date_column>, c.c_id, c.m_id, c.<other_column_1>, c.<other_column_2>, c.<other_column_3>, c.<other_column_4>, c.<other_column_5>, c.<other_column_6>, c.<other_column_7>, c.<other_column_8>
Buffers: shared hit=11171 read=4680887
Worker 0: actual time=0.814..279170.920 rows=130423418 loops=1
Buffers: shared hit=3746 read=1555017
Worker 1: actual time=1.095..278955.836 rows=131603000 loops=1
Buffers: shared hit=3592 read=1569260
-> Hash (cost=8132.89..8132.89 rows=2153 width=20) (actual time=0.046..0.047 rows=8 loops=3)
Output: c.<date_column>, c.m_id, c.other_unique_id_1, c.modified_id
Buckets: 4096 Batches: 1 Memory Usage: 33kB
Buffers: shared hit=35 read=3
Worker 0: actual time=0.026..0.027 rows=8 loops=1
Buffers: shared hit=13
Worker 1: actual time=0.059..0.060 rows=8 loops=1
Buffers: shared hit=12 read=1
-> Bitmap Heap Scan on master c (cost=41.25..8132.89 rows=2153 width=20) (actual time=0.031..0.041 rows=8 loops=3)
Output: c.<date_column>, c.m_id, c.other_unique_id_1, c.modified_id
Recheck Cond: (c.modified_id = 561869)
Heap Blocks: exact=8
Buffers: shared hit=35 read=3
Worker 0: actual time=0.018..0.024 rows=8 loops=1
Buffers: shared hit=13
Worker 1: actual time=0.043..0.055 rows=8 loops=1
Buffers: shared hit=12 read=1
-> Bitmap Index Scan on ix_master_modified_id (cost=0.00..40.71 rows=2153 width=0) (actual time=0.027..0.027 rows=8 loops=3)
Index Cond: (c.modified_id = 561869)
Buffers: shared hit=11 read=3
Worker 0: actual time=0.015..0.015 rows=8 loops=1
Buffers: shared hit=5
Worker 1: actual time=0.038..0.038 rows=8 loops=1
Buffers: shared hit=4 read=1
Planning time: 0.354 ms
Execution time: 297009.825 ms
Query
select
*
from
master m
inner join child c on
m.m_id = c.m_id
where
m.modified_id = <xyz>
DDL statement for Master table
CREATE TABLE master (
<date_column> timestamp NULL DEFAULT CURRENT_DATE,
m_id serial4 NOT NULL,
<other_unique_id_1> int4 NULL,
modified_id int4 NULL,
CONSTRAINT master_pkey PRIMARY KEY (m_id),
CONSTRAINT <foreign key> FOREIGN KEY (<other_unique_id_1>) REFERENCES <other table>(<other_unique_id_1>)
);
CREATE INDEX ix_master_modified_id ON master USING btree (modified_id);
CREATE UNIQUE INDEX ix_master_other_unique_id_1 ON master USING btree (other_unique_id_1);
DDL for child table
CREATE TABLE child (
<date column> timestamp NULL DEFAULT now(),
c_id serial4 NOT NULL,
m_id int4 NULL,
<other_column_1> varchar(50) NULL,
<other_column_2> varchar(50) NULL,
<other_column_3> bool NULL,
<other_column_4> varchar(50) NULL,
<other_column_5> varchar(2) NULL,
<other_column_6> varchar(50) NULL,
<other_column_7> varchar(10) NULL,
<other_column_8> text NULL,
CONSTRAINT child_pkey PRIMARY KEY (c_id),
CONSTRAINT child_master_fkey FOREIGN KEY (m_id) REFERENCES master(m_id)
);
CREATE INDEX ix_child_m_id ON child USING btree (m_id);
CodePudding user response:
I think the root of the problem is this gross misestimate.
-> Bitmap Index Scan on ix_master_modified_id (cost=0.00..40.71 rows=2153 width=0) (actual time=0.027..0.027 rows=8 loops=3)
You said you already did VACUUM ANALYZE the table. In that case you might need to increase the stats size, and then ANALYZE again, to get a better estimate. You could either change the default_statistics_target globally, or target this one column with:
alter table master alter modified_id set statistics 10000;
analyze master;
You might not need to increase it all the way to 10000 (the max allowed), but if you only do it for one column I see no reason trying to fine tune it. Just break out the big guns right away. If it works, then you can worry about fine tuning it.