Why isn't postgresql using an index with my group by aggregate?-CodePudding

I have a table in a postgresql database with the timescaledb extension enabled that looks like:

 ------------ -------------------------- ------------- 
| Column     | Type                     | Modifiers   |
|------------ -------------------------- -------------|
| time       | timestamp with time zone |  not null   |
| value      | double precision         |  not null   |
| being      | metric_being             |  not null   |
| device     | integer                  |  not null   |
 ------------ -------------------------- -------------

And an index on the table:

"metrics_device_time_idx" btree (device, "time" DESC)

But when I query the table using a Group By:

explain select max(time), device from metrics group by device;

It does not use the index:

 ---------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------- 
| QUERY PLAN                                                                                                        |
|-------------------------------------------------------------------------------------------------------------------|
| Finalize GroupAggregate  (cost=104577.41..104588.61 rows=22 width=12)                                             |
|   Group Key: _hyper_9_95_chunk.device                                                                             |
|   ->  Gather Merge  (cost=104577.41..104587.95 rows=88 width=12)                                                  |
|         Workers Planned: 4                                                                                        |
|         ->  Sort  (cost=103577.35..103577.41 rows=22 width=12)                                                    |
|               Sort Key: _hyper_9_95_chunk.device                                                                  |
|               ->  Partial HashAggregate  (cost=103576.64..103576.86 rows=22 width=12)                             |
|                     Group Key: _hyper_9_95_chunk.device                                                           |
|                     ->  Parallel Append  (cost=0.00..95035.06 rows=1708317 width=12)                              |
|                           ->  Parallel Seq Scan on _hyper_9_95_chunk  (cost=0.00..44602.70 rows=1122370 width=12) |
|                           ->  Parallel Seq Scan on _hyper_9_92_chunk  (cost=0.00..24807.61 rows=756061 width=12)  |
 -------------------------------------------------------------------------------------------------------------------

And ends up begin sort of slow. What is really about 10x faster, on the other hand, is

select max(time), 29 from metrics where device = 29
union
select max(time), 30 from metrics where device = 30
union
...

Why is this the case? Can I change my index or query to speed-up the query using the group by? Why is the union so much faster?

CodePudding user response：

Postgres cannot to use index for this case. This is just not supported by optimizer now. You can find some info about this - there are patch named "index skip scan", but this work is not finished yet. You can use some workarounds.

CodePudding user response：

As @Pavel Stehule mentions in his answer, Postgres does not implement index skip scans, which are necessary to optimize these types of queries. Timescaledb recognized that these types of queries are really helpful in timeseries analysis, so they implemented an index skip scan themselves. It is present in their extension from version 2.2.1 onward, see their blog post about it here.

After upgrading the extension to >= 2.2.1, the query can be rewritten to use the index skip scan:

select distinct on (device) device, time from metrics order by device, time desc

This then uses their index skip scan implementation, and in my case sped up the query by around 100x.