Home > Software engineering >  Postgresql not using index when filter is in subquery
Postgresql not using index when filter is in subquery

Time:07-06

I am trying to understand why Postgresql doesn't use index when one of the condition is in subquery.

I have index on (ticker, date)

EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = ( SELECT ticker FROM tickers_profile WHERE short_name = '1TECH' ) 
    AND DATE <= '2022-07-05'
Gather  (cost=1008.30..116680.66 rows=1065 width=5) (actual time=191.767..731.514 rows=285 loops=1)
  Workers Planned: 2
  Params Evaluated: $0
  Workers Launched: 2
  InitPlan 1 (returns $0)
    ->  Index Scan using tickers_profile_short_name_idx on tickers_profile  (cost=0.28..8.30 rows=1 width=6) (actual time=0.023..0.024 rows=1 loops=1)
          Index Cond: (short_name = '1TECH'::citext)
  ->  Parallel Seq Scan on daily_prices dp  (cost=0.00..115565.86 rows=444 width=5) (actual time=364.367..719.817 rows=95 loops=3)
        Filter: ((date <= '2022-07-05'::date) AND ((ticker)::text = $0))
        Rows Removed by Filter: 674481
Planning Time: 0.134 ms
Execution Time: 731.557 ms

If subquery is inlined (as a param)

EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = '03041'
    AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp  (cost=30.42..3646.10 rows=975 width=5) (actual time=0.506..0.758 rows=285 loops=1)
  Recheck Cond: (((ticker)::text = '03041'::text) AND (date <= '2022-07-05'::date))
  Heap Blocks: exact=16
  ->  Bitmap Index Scan on daily_prices_pkey  (cost=0.00..30.18 rows=975 width=0) (actual time=0.490..0.490 rows=285 loops=1)
        Index Cond: (((ticker)::text = '03041'::text) AND (date <= '2022-07-05'::date))
Planning Time: 0.077 ms
Execution Time: 0.793 ms

I have tried:

  1. CTE (with statements), same performance
  2. Join statement, same performance

EDIT: Using a functions work

CREATE 
    OR REPLACE FUNCTION get_ticker_from_short_name ( _short_name VARCHAR ) RETURNS TEXT AS $BODY$ SELECT
    ticker 
FROM
    tickers_profile 
WHERE
    short_name = _short_name$BODY$ LANGUAGE SQL VOLATILE COST 100
EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = ( SELECT get_ticker_from_short_name ( '1TECH' ) ) 
    AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp  (cost=36.61..3969.86 rows=1065 width=5) (actual time=0.369..0.565 rows=285 loops=1)
  Recheck Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
  Heap Blocks: exact=16
  InitPlan 1 (returns $0)
    ->  ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.314..0.314 rows=1 loops=1)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Bitmap Index Scan on daily_prices_pkey  (cost=0.00..31.08 rows=1065 width=0) (actual time=0.362..0.362 rows=285 loops=1)
        Index Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Planning Time: 0.088 ms
Execution Time: 0.604 ms

CodePudding user response:

PostgreSQL could use an index there, but it prefers not to, probably because there are some values for daily_prices.ticker that occur very frequently and would make an index scan a bad choice. The optimizer doesn't know which value the subquery will return. Perhaps split the job into two queries, then the optimizer knows when it plans the second query.

If you know that none of the frequent values can be returned, you could add a WHERE condition like AND dp.ticker NOT IN ('frequent value 1', 'frequent value 2').

CodePudding user response:

Using a function works for my case

CREATE 
    OR REPLACE FUNCTION get_ticker_from_short_name ( _short_name VARCHAR ) RETURNS TEXT AS $BODY$ SELECT
    ticker 
FROM
    tickers_profile 
WHERE
    short_name = _short_name$BODY$ LANGUAGE SQL VOLATILE COST 100
EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = ( SELECT get_ticker_from_short_name ( '1TECH' ) ) 
    AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp  (cost=36.61..3969.86 rows=1065 width=5) (actual time=0.369..0.565 rows=285 loops=1)
  Recheck Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
  Heap Blocks: exact=16
  InitPlan 1 (returns $0)
    ->  ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.314..0.314 rows=1 loops=1)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Bitmap Index Scan on daily_prices_pkey  (cost=0.00..31.08 rows=1065 width=0) (actual time=0.362..0.362 rows=285 loops=1)
        Index Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Planning Time: 0.088 ms
Execution Time: 0.604 ms
  • Related