I am trying to understand why Postgresql doesn't use index when one of the condition is in subquery.
I have index on (ticker, date)
EXPLAIN ANALYZE SELECT OPEN
FROM
daily_prices dp
WHERE
dp.ticker = ( SELECT ticker FROM tickers_profile WHERE short_name = '1TECH' )
AND DATE <= '2022-07-05'
Gather (cost=1008.30..116680.66 rows=1065 width=5) (actual time=191.767..731.514 rows=285 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Index Scan using tickers_profile_short_name_idx on tickers_profile (cost=0.28..8.30 rows=1 width=6) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (short_name = '1TECH'::citext)
-> Parallel Seq Scan on daily_prices dp (cost=0.00..115565.86 rows=444 width=5) (actual time=364.367..719.817 rows=95 loops=3)
Filter: ((date <= '2022-07-05'::date) AND ((ticker)::text = $0))
Rows Removed by Filter: 674481
Planning Time: 0.134 ms
Execution Time: 731.557 ms
If subquery is inlined (as a param)
EXPLAIN ANALYZE SELECT OPEN
FROM
daily_prices dp
WHERE
dp.ticker = '03041'
AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp (cost=30.42..3646.10 rows=975 width=5) (actual time=0.506..0.758 rows=285 loops=1)
Recheck Cond: (((ticker)::text = '03041'::text) AND (date <= '2022-07-05'::date))
Heap Blocks: exact=16
-> Bitmap Index Scan on daily_prices_pkey (cost=0.00..30.18 rows=975 width=0) (actual time=0.490..0.490 rows=285 loops=1)
Index Cond: (((ticker)::text = '03041'::text) AND (date <= '2022-07-05'::date))
Planning Time: 0.077 ms
Execution Time: 0.793 ms
I have tried:
- CTE (with statements), same performance
- Join statement, same performance
EDIT: Using a functions work
CREATE
OR REPLACE FUNCTION get_ticker_from_short_name ( _short_name VARCHAR ) RETURNS TEXT AS $BODY$ SELECT
ticker
FROM
tickers_profile
WHERE
short_name = _short_name$BODY$ LANGUAGE SQL VOLATILE COST 100
EXPLAIN ANALYZE SELECT OPEN
FROM
daily_prices dp
WHERE
dp.ticker = ( SELECT get_ticker_from_short_name ( '1TECH' ) )
AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp (cost=36.61..3969.86 rows=1065 width=5) (actual time=0.369..0.565 rows=285 loops=1)
Recheck Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Heap Blocks: exact=16
InitPlan 1 (returns $0)
-> ProjectSet (cost=0.00..5.27 rows=1000 width=32) (actual time=0.314..0.314 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
-> Bitmap Index Scan on daily_prices_pkey (cost=0.00..31.08 rows=1065 width=0) (actual time=0.362..0.362 rows=285 loops=1)
Index Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Planning Time: 0.088 ms
Execution Time: 0.604 ms
CodePudding user response:
PostgreSQL could use an index there, but it prefers not to, probably because there are some values for daily_prices.ticker
that occur very frequently and would make an index scan a bad choice. The optimizer doesn't know which value the subquery will return. Perhaps split the job into two queries, then the optimizer knows when it plans the second query.
If you know that none of the frequent values can be returned, you could add a WHERE
condition like AND dp.ticker NOT IN ('frequent value 1', 'frequent value 2')
.
CodePudding user response:
Using a function works for my case
CREATE
OR REPLACE FUNCTION get_ticker_from_short_name ( _short_name VARCHAR ) RETURNS TEXT AS $BODY$ SELECT
ticker
FROM
tickers_profile
WHERE
short_name = _short_name$BODY$ LANGUAGE SQL VOLATILE COST 100
EXPLAIN ANALYZE SELECT OPEN
FROM
daily_prices dp
WHERE
dp.ticker = ( SELECT get_ticker_from_short_name ( '1TECH' ) )
AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp (cost=36.61..3969.86 rows=1065 width=5) (actual time=0.369..0.565 rows=285 loops=1)
Recheck Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Heap Blocks: exact=16
InitPlan 1 (returns $0)
-> ProjectSet (cost=0.00..5.27 rows=1000 width=32) (actual time=0.314..0.314 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
-> Bitmap Index Scan on daily_prices_pkey (cost=0.00..31.08 rows=1065 width=0) (actual time=0.362..0.362 rows=285 loops=1)
Index Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Planning Time: 0.088 ms
Execution Time: 0.604 ms