I have a table where I store rows with external ids. Quite often I need to select latest timestamp for given external ids. Now it is a bottleneck for my app
Query:
SELECT DISTINCT ON ("T1"."external_id") "T1"."external_id", "T1"."timestamp"
FROM "T1"
WHERE "T1"."external_id" IN ('825889935', '825904511')
ORDER BY "T1"."external_id" ASC, "T1"."timestamp" DESC
Explain:
Unique (cost=169123.13..169123.19 rows=12 width=18) (actual time=1327.443..1334.118 rows=2 loops=1)
-> Sort (cost=169123.13..169123.16 rows=12 width=18) (actual time=1327.441..1334.112 rows=2 loops=1)
Sort Key: external_id, timestamp DESC
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..169122.91 rows=12 width=18) (actual time=752.577..1334.056 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on T1 (cost=0.00..168121.71 rows=5 width=18) (actual time=921.649..1300.556 rows=1 loops=3)
Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
Rows Removed by Filter: 1168882
Planning Time: 0.592 ms
Execution Time: 1334.159 ms
What could I do to make this query faster? Or probably should I use completely different query?
UPDATE:
Added new query plan as asked @jahrl. It looks like query is faster but previous query plan was made under the load and now it works similar time
Finalize GroupAggregate (cost=169121.80..169123.21 rows=12 width=18) (actual time=321.009..322.410 rows=2 loops=1)
Group Key: external_id
-> Gather Merge (cost=169121.80..169123.04 rows=10 width=18) (actual time=321.003..322.403 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=168121.77..168121.86 rows=5 width=18) (actual time=318.671..318.672 rows=1 loops=3)
Group Key: external_id
-> Sort (cost=168121.77..168121.78 rows=5 width=18) (actual time=318.664..318.665 rows=1 loops=3)
Sort Key: external_id
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Seq Scan on T1 (cost=0.00..168121.71 rows=5 width=18) (actual time=144.338..318.611 rows=1 loops=3)
Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
Rows Removed by Filter: 1170827
Planning Time: 0.093 ms
Execution Time: 322.441 ms
CodePudding user response:
Perhaps a basic GROUP BY
query will perform better?
SELECT "T1"."external_id", MAX("T1"."timestamp") as "timestamp"
FROM "T1"
WHERE "T1"."external_id" IN ('825889935', '825904511')
GROUP BY "T1"."external_id"
ORDER BY "T1"."external_id" ASC
And, as @melcher said, don't forget an ("external_id", "timestamp") index!
CodePudding user response:
Look at the number of rows removed by the filter and create an index on external_id
.