Home > Software design >  How do I flatten the complexity of this select statement in postgresql?
How do I flatten the complexity of this select statement in postgresql?

Time:03-18

I have the following table:

create table account_values
(
    account_id   bigint    not null,
    timestamp    timestamp not null,
    value1       numeric   not null,
    value2       numeric   not null,
    primary key (timestamp, account_id)
);

I also have the following query which produces an array of every value1 value2 of the row with the closest (before) timestamp to an evenly spaced generated series:

select array [(trunc(extract(epoch from gs) * 1000))::text, COALESCE((values.value1   values.value2), 0.000000)::text]
from generate_series((now() - '1 year'::interval)::timestamp, now(), interval '1 day') gs
         left join lateral (select value1, value2
                            from account_values
                            where timestamp <= gs and account_id = ?
                            order by timestamp desc
                            limit 1) equity on (TRUE);

The issue with this method of generating such an array becomes apparent when inspecting the output of explain analyse:

Nested Loop Left Join  (cost=0.45..3410.74 rows=1000 width=32) (actual time=0.134..3948.546 rows=366 loops=1)
  ->  Function Scan on generate_series gs  (cost=0.02..10.02 rows=1000 width=8) (actual time=0.075..0.244 rows=366 loops=1)
  ->  Limit  (cost=0.43..3.36 rows=1 width=26) (actual time=10.783..10.783 rows=1 loops=366)
        ->  Index Scan Backward using account_values_pkey on account_values  (cost=0.43..67730.27 rows=23130 width=26) (actual time=10.782..10.782 rows=1 loops=366)
"              Index Cond: ((""timestamp"" <= gs.gs) AND (account_id = 459))"
Planning Time: 0.136 ms
Execution Time: 3948.659 ms

Specifically: loops=366

This problem will only get worse if I ever decide to decrease my generated series interval time.

Is there a way to flatten this looped select into a more efficient query? If not, what are some other approaches I can take to improving the performance?

edit;

One hard requirement is that the result of the statement cannot be altered. For example I don't want the range to round to the closest day. The range should always start the second the statement is invoked and each interval precisely one day before.

CodePudding user response:

based on Edouard answer.

   with a(timestamp,values_agg) as
(select timestamp, array_agg(value1   value2) as values_agg from account_values
where account_id = ?
and tsrange(now()::timestamp - '1 year'::interval, now()::timestamp)
group by 1)
select
array((trunc(extract(epoch from timestamp) *1000))::text,
    coalesce(values_agg, 0.00000)::text]);

CodePudding user response:

Not sure you will get the exact same result, but it should be faster :

select array [ (trunc(extract(epoch from date_trunc('day', timestamp)) * 1000))::text
             , (array_agg(value1   value2 ORDER BY timestamp DESC))[1] :: text
             ]
  from account_values
 where account_id = ?
   and timestamp <@ tsrange(now() - '1 year'::interval, now())
 group by date_trunc('day', timestamp)
  • Related