Postgresql partition pruning not working when using unnest() in a subquery-CodePudding

Postgresql (13.4) is not able to come up with a query plan that uses execution-time partition pruning when using unnest() in a subquery.

Given these tables:

CREATE TABLE users (
    user_id uuid, 
    channel_id uuid, 
    CONSTRAINT user_pk PRIMARY KEY(user_id, channel_id)
) 
PARTITION BY hash(user_id);
CREATE TABLE users_0 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE users_1 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 1);
    
CREATE TABLE channels (
    channel_id uuid, 
    user_ids uuid[],
    CONSTRAINT channel_pk PRIMARY KEY(channel_id)
) PARTITION BY hash(channel_id);

CREATE TABLE channels_0 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE channels_1 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 1);

Insert some data:

INSERT INTO users(user_id, channel_id) VALUES('0861180b-c972-42fe-9fb3-3b55e652f893', '45205876-7270-4e06-ab8d-b5f669298422');
INSERT INTO channels(channel_id, user_ids) VALUES('45205876-7270-4e06-ab8d-b5f669298422', '{0861180b-c972-42fe-9fb3-3b55e652f893}');

INSERT INTO users 
SELECT 
    gen_random_uuid() as user_id,
    gen_random_uuid() as channel_id
FROM generate_series(1, 100);

INSERT INTO channels
SELECT
    (SELECT max(channel_id::text) FROM (SELECT channel_id FROM users ORDER BY random()*generate_series LIMIT 1) c)::uuid as channel_id,
    (SELECT array_agg(DISTINCT user_id::text) FROM (SELECT user_id FROM users ORDER BY random()*generate_series 
    LIMIT 1) u)::uuid[] as user_ids
FROM (SELECT * FROM generate_series(1, 100)) g
ON conflict DO NOTHING;

The following query:

EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
    SELECT unnest(user_ids) FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'

returns a query plan scanning all partitions.

Hash Semi Join  (cost=8.45..37.28 rows=8 width=32) (actual time=0.208..0.387 rows=1 loops=1)
  Hash Cond: (users.user_id = (unnest(channels.user_ids)))
  ->  Append  (cost=0.00..28.71 rows=8 width=32) (actual time=0.037..0.134 rows=1 loops=1)
        ->  Seq Scan on users_0 users_1  (cost=0.00..27.00 rows=7 width=32) (actual time=0.021..0.041 rows=1 loops=1)
              Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
              Rows Removed by Filter: 45
        ->  Seq Scan on users_1 users_2  (cost=0.00..1.68 rows=1 width=32) (actual time=0.018..0.027 rows=0 loops=1)
              Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
              Rows Removed by Filter: 54
  ->  Hash  (cost=8.33..8.33 rows=10 width=16) (actual time=0.131..0.172 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  ProjectSet  (cost=0.15..8.23 rows=10 width=16) (actual time=0.060..0.114 rows=1 loops=1)
              ->  Index Scan using channels_0_pkey on channels_0 channels  (cost=0.15..8.17 rows=1 width=32) (actual time=0.040..0.059 rows=1 loops=1)
                    Index Cond: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Planning Time: 0.363 ms
Execution Time: 0.515 ms

I would expect Postgresql to run the subquery and look at the user_id's returned to figure out what partitions this data will be in. However, Postgresql is looking into all partitions for this data. I have tried using one row pr user_id in the channels table, this works perfect.

EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
    SELECT user_id FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'

Postgresql then does not run any of the steps for partitions that cannot hold any data.

It seems unnest() is causing execution time partition pruning to not work. Why is that?

SOLUTION: I can confirm jjanes's solution. By adding 100k rows to the tables the query, with unnest(), does partition pruning at execution-time.

CodePudding user response：

Your example only shows what it did use for one query, not everything it is capable of doing.

Your tables are laughably small. Put another 10,000 rows into the users table, so that the index is actually important, and see what it does.

                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.85..151.99 rows=2 width=32) (actual time=0.040..0.042 rows=1 loops=1)
   ->  HashAggregate  (cost=1.57..1.67 rows=10 width=16) (actual time=0.022..0.023 rows=1 loops=1)
         Group Key: unnest(channels.user_ids)
         Batches: 1  Memory Usage: 24kB
         ->  ProjectSet  (cost=0.00..1.44 rows=10 width=16) (actual time=0.016..0.019 rows=1 loops=1)
               ->  Seq Scan on channels_0 channels  (cost=0.00..1.39 rows=1 width=37) (actual time=0.013..0.016 rows=1 loops=1)
                     Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
                     Rows Removed by Filter: 30
   ->  Append  (cost=0.28..15.01 rows=2 width=32) (actual time=0.016..0.017 rows=1 loops=1)
         ->  Index Only Scan using users_0_pkey on users_0 users_1  (cost=0.28..7.50 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1)
               Index Cond: ((user_id = (unnest(channels.user_ids))) AND (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid))
               Heap Fetches: 1
         ->  Index Only Scan using users_1_pkey on users_1 users_2  (cost=0.28..7.50 rows=1 width=32) (never executed)
               Index Cond: ((user_id = (unnest(channels.user_ids))) AND (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid))
               Heap Fetches: 0
 Planning Time: 0.470 ms
 Execution Time: 0.087 ms

The (never executed) is due to execution-time pruning.