Home > database >  SQL query for finding latest or max value of timestamp from table corresponding to N ids Timescaledb
SQL query for finding latest or max value of timestamp from table corresponding to N ids Timescaledb

Time:01-14

I have a table tab1 in timescale db which has 3 columns tag, time, value. time and tag make up the pk for the table: (time, tag).

There are more than 500 lakhs (50 000 000) of rows. I need to find the latest timestamp or the max(time) for each of the N tags.

There are few things that I tried, I'll share my experience with each:

  1. inner query
SELECT "time", "tag", "value"
FROM tab1 
WHERE ("tag","time") IN 
(SELECT "tag", MAX("time") FROM tab1 WHERE "tag" IN(tag1,tag2) GROUP BY "tag" );

this is giving result but taking around 19 sec to execute which is beyond the acceptable limit

  1. using last function from timescale db
SELECT tag, last(time, time), last(value,time)
FROM tab1
WHERE "tag" IN (tag1,tag2) GROUP BY "tag" ;

this is giving output within 10 sec.

I need to find another feasible solution similar to 2nd one that is perhaps performing better. I tried few other things like LATERAL JOIN (3), WINDOW FUNCTIONS (ROW_NUMBER, PARTITION) (4) but the solution is not as expected.

  1. using lateral gives cross of multiple columns and not one single value with max time as expected. Moreover it is taking 15sec to execute but it might be due to wrong query.
SELECT table1."tag", table1."time",table1."value" from tab1 as table1
join lateral (
    SELECT table2 ."tag",table2 ."time" from tab1 as table2   
    where table2."tag" = table1."tag"
    order by table2."time" desc limit 1
) p on true
where table1."tag" in (tag1,tag2)
  1. when trying partition I want to put limit 1 like this: (partition by tag order by time desc limit 1) but it is giving syntax error. without limit 1 I am not getting the latest time.
SELECT * from 
( SELECT *, row_number() over (partition by tag order by time desc) as rownum
from  tab1) a
where tag in (tag1,tag2)

Can anyone suggest what's wrong with the queries in 3,4 or suggest if there is another alternative.

Index for my table is: enter image description here

CodePudding user response:

There are a couple things that are going to help this and make this query easier and more performant. The first, and probably most important one is the index on the table / hypertable - it will need to be a multi-column index on tag, time desc - the order of the time doesn't matter all that much, but the order of columns in the index matters a lot. tag has to be the first column here because we need to first search by tag, then get the latest time, if we have either separate indexes or if we're ordered by time first, this will be highly inefficient.

You can create this index with a call like this:

CREATE INDEX ON tab1 (tag, "time" DESC);

The next thing is the formulation of the query. The simplest way to get this for each tag is to write a DISTINCT ON query. In Timescale we've optimized this sort of query. It's a bit of a weird way to formulate it so can be a bit hard to find though.

Basically you'd write it something like:

SELECT DISTINCT ON (tag) tag, "time" FROM tab1 ORDER BY tag, "time" DESC;

And that should give you what you want. It's a bit weird, but it will work!

I'm not going to go through the other approaches, but most of them will be improved dramatically with the index, but this is still likely going to be the most performant approach.

If you want to, please post a comment with how this worked and if it sped things up for you!

CodePudding user response:

So figured out the other 2 approaches:

  1. Using lateral

SELECT distinct t_outer.tag, t_top.time, t_top.value from tab1 t_outer join lateral ( SELECT * from tab1 t_inner where t_inner.tag = t_outer.tag order by t_inner.time desc limit 1 ) t_top on true where t_outer.tag in (tag1)

This works but takes more than 14 sec to process.

  1. Using Window functions

SELECT * FROM (SELECT tag,time,"value", rank() OVER (PARTITION BY tag order by time desc ) as RN FROM tab1 WHERE tag IN(tag1) ) as results WHERE results.RN=1;

This also works and takes around 9 sec to process.

When comparing the results inner query and lateral are worst performing queries even for a single tag. So they are eliminated.

Now Last() and Partition() query are our top contenders, till now. Last() is performing better if fewer columns are fetched, otherwise if we fetch all the columns, the execution time of both of them is comparable.

One more point that I would like to add is that using ORDER BY LIMIT query is performing way way better than all the options (execution time is less than 1 sec for single tag) however a drawback is that it wont work for multiple tag input. So in case you need to query for single tag and have similar db configs, this is something you can try.

  • Related