I have a fairly simple query which gets the top ten scores of a large table (just under 10 million rows) and returns them in descending order. The scores are made up of sums aggregated using a group by clause, and the group by seems especially costly, using Sort Method: external merge Disk: 190080k
Is there some way I can speed this up? I already have indices on user.test_id and user.score (descending). I'd prefer not to change work_mem
as I have limited control over our postgres settings.
Query:
select
(select test from test where top_scores.test_id = test.id),
(select type from test where top_scores.test_id = test.id),
sum_score
from (select sum(score) as sum_score,
test_id
from user
group by test_id
order by sum_score desc
limit 10
) top_scores
Query Plan:
Subquery Scan on top_scores (cost=1412662.62..1412831.69 rows=10 width=16) (actual time=164098.107..164098.714 rows=10 loops=1)"
-> Limit (cost=1412662.62..1412662.64 rows=10 width=16) (actual time=164098.042..164098.144 rows=10 loops=1)"
-> Sort (cost=1412662.62..1419366.96 rows=2681736 width=16) (actual time=164098.033..164098.067 rows=10 loops=1)"
Sort Key: (sum(user.score)) DESC"
Sort Method: top-N heapsort Memory: 25kB"
-> GroupAggregate (cost=1271799.65..1354711.27 rows=2681736 width=16) (actual time=72815.313..152605.093 rows=2499234 loops=1)"
Group Key: user.test_id"
-> Sort (cost=1271799.65..1290497.74 rows=7479234 width=16) (actual time=72815.273..107823.507 rows=7479234 loops=1)"
Sort Key: user.test_id"
Sort Method: external merge Disk: 190080kB"
-> Seq Scan on user (cost=0.00..162238.34 rows=7479234 width=16) (actual time=0.009..33795.669 rows=7479234 loops=1)"
SubPlan 1"
-> Index Scan using test_id_idx on test (cost=0.43..8.45 rows=1 width=14) (actual time=0.012..0.016 rows=1 loops=10)"
Index Cond: (top_scores.test_id = id)"
SubPlan 2"
-> Index Scan using test_id_idx on test test_1 (cost=0.43..8.45 rows=1 width=3) (actual time=0.006..0.010 rows=1 loops=10)"
Index Cond: (top_scores.test_id = id)"
Planning time: 0.724 ms"
Execution time: 164135.458 ms"
CodePudding user response:
Try:
select
test, type, sum_score
from
test
join
(select
sum(score) as sum_score,
test_id
from
user
group by
test_id
order by
sum_score desc
limit
10
) top_scores
on
test.test_id = top_scores.test_id
CodePudding user response:
I wouldn't expect separate indexes on test_id and score to help here. But a multicolumn index on (test_id, score)
should get use of an index-only scan and so avoid the sort. If it doesn't immediately help, then VACUUM the table to get the visibility map bits set.
Also, your hardware appears to be spectacularly bad, or maybe just extremely overloaded.