SELECT count(e_id) AS count,
e_id
FROM test
WHERE created_at BETWEEN '2021-12-01 00:00:00' AND '2021-12-08 00:00:00'
AND std IN ( '1' )
AND section IN ( 'Sample' )
GROUP BY e_id
ORDER BY count DESC
LIMIT 4
The table has around 1 M records. The query execution is less than 40 ms but computation takes a hit at the group by and query cost high.
Limit (cost=26133.76..26133.77 rows=4 width=45) (actual time=52.300..52.303 rows=3 loops=1)
-> Sort (cost=26133.76..26134.77 rows=403 width=45) (actual time=52.299..52.301 rows=3 loops=1)
Sort Key: (count(e_id)) DESC
Sort Method: quicksort Memory: 25kB
-> GroupAggregate (cost=26120.66..26127.72 rows=403 width=45) (actual time=52.287..52.289 rows=3 loops=1)
Group Key: e_id
-> Sort (cost=26120.66..26121.67 rows=404 width=37) (actual time=52.281..52.283 rows=5 loops=1)
Sort Key: e_id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on test (cost=239.19..26103.17 rows=404 width=37) (actual time=49.339..52.261 rows=5 loops=1)
Recheck Cond: ((section)::text = 'test'::text)
" Filter: ((created_at >= '2021-12-01 00:00:00'::timestamp without time zone) AND (created_at <= '2021-12-08 00:00:00'::timestamp without time zone) AND ((std)::text = ANY ('{1,2}'::text[])))"
Rows Removed by Filter: 38329
Heap Blocks: exact=33997
-> Bitmap Index Scan on index_test_on_section (cost=0.00..239.09 rows=7270 width=0) (actual time=6.815..6.815 rows=38334 loops=1)
Index Cond: ((section)::text = 'test'::text)
How can I optimize the group by and count, so that CPU does not shoot up?
CodePudding user response:
The best index for this query is
CREATE INDEX ON test (section, created_at, std) INCLUDE (e_id);
Then VACUUM
the table and try again.
CodePudding user response:
Unless you have shown us the wrong plan, the slow step is not the group by, but rather the Bitmap Heap Scan
Your index on "section" returns 38334, of which all but 5 are filtered out. We can't tell if they are filtered out mostly by the "std" criterion or the "created_at" one. You need a more specific multicolumn index. The one i think is most likely to be effective is on (section, std, created_at)
.