Postgres index for aggregate query-CodePudding

SELECT count(e_id) AS count,
       e_id
FROM   test
WHERE  created_at BETWEEN '2021-12-01 00:00:00' AND '2021-12-08 00:00:00'
       AND std IN ( '1' )
       AND section IN ( 'Sample' )
GROUP  BY e_id
ORDER  BY count DESC
LIMIT  4

The table has around 1 M records. The query execution is less than 40 ms but computation takes a hit at the group by and query cost high.

Limit  (cost=26133.76..26133.77 rows=4 width=45) (actual time=52.300..52.303 rows=3 loops=1)
  ->  Sort  (cost=26133.76..26134.77 rows=403 width=45) (actual time=52.299..52.301 rows=3 loops=1)
        Sort Key: (count(e_id)) DESC
        Sort Method: quicksort  Memory: 25kB
        ->  GroupAggregate  (cost=26120.66..26127.72 rows=403 width=45) (actual time=52.287..52.289 rows=3 loops=1)
              Group Key: e_id
              ->  Sort  (cost=26120.66..26121.67 rows=404 width=37) (actual time=52.281..52.283 rows=5 loops=1)
                    Sort Key: e_id
                    Sort Method: quicksort  Memory: 25kB
                    ->  Bitmap Heap Scan on test  (cost=239.19..26103.17 rows=404 width=37) (actual time=49.339..52.261 rows=5 loops=1)
                          Recheck Cond: ((section)::text = 'test'::text)
"                          Filter: ((created_at >= '2021-12-01 00:00:00'::timestamp without time zone) AND (created_at <= '2021-12-08 00:00:00'::timestamp without time zone) AND ((std)::text = ANY ('{1,2}'::text[])))"
                          Rows Removed by Filter: 38329
                          Heap Blocks: exact=33997
                          ->  Bitmap Index Scan on index_test_on_section  (cost=0.00..239.09 rows=7270 width=0) (actual time=6.815..6.815 rows=38334 loops=1)
                                Index Cond: ((section)::text = 'test'::text)

How can I optimize the group by and count, so that CPU does not shoot up?

CodePudding user response：

The best index for this query is

CREATE INDEX ON test (section, created_at, std) INCLUDE (e_id);

Then VACUUM the table and try again.

CodePudding user response：

Unless you have shown us the wrong plan, the slow step is not the group by, but rather the Bitmap Heap Scan

Your index on "section" returns 38334, of which all but 5 are filtered out. We can't tell if they are filtered out mostly by the "std" criterion or the "created_at" one. You need a more specific multicolumn index. The one i think is most likely to be effective is on (section, std, created_at).