I have a table named assets. Here is the ddl:
create table assets (
id bigint primary key,
name varchar(255) not null,
value double precision not null,
business_time timestamp with time zone,
insert_time timestamp with time zone default now() not null
);
create index idx_assets_name on assets (name);
I need to extract the newest (based on insert_time) value for each asset name. This is the query that I initially used:
SELECT DISTINCT
ON (a.name) *
FROM home.assets a
WHERE a.name IN (
'USD_RLS',
'EUR_RLS',
'SEKKEH_RLS',
'NIM_SEKKEH_RLS',
'ROB_SEKKEH_RLS',
'BAHAR_RLS',
'GOLD_18_RLS',
'GOLD_OUNCE_USD',
'SILVER_OUNCE_USD',
'PLATINUM_OUNCE_USD',
'GOLD_MESGHAL_RLS',
'GOLD_24_RLS',
'STOCK_IR',
'AED_RLS',
'GBP_RLS',
'CAD_RLS',
'CHF_RLS',
'TRY_RLS',
'AUD_RLS',
'JPY_RLS',
'CNY_RLS',
'RUB_RLS',
'BTC_USD'
)
ORDER BY a.name,
a.insert_time DESC;
I have around 300,000 rows in the assets table. On my VPS this query takes about 800 ms. this is causing a whole response time of about 1 second for a specific endpoint. This is a bit slow and considering the fact that the assets table is growing fast, this endpoint will be even slower in the near future. I also tried to avoid IN(...)
using this query:
SELECT DISTINCT
ON (a.name) *
FROM home.assets a
ORDER BY a.name,
a.insert_time DESC;
But I didn't notice a significant difference. Any idea how I could optimize this query?
CodePudding user response:
You may try adding the following index to your table:
CREATE INDEX idx ON assets (name, insert_time DESC);
If used, Postgres can simply scan this index to find the distinct record having the most recent insert_time
for each name.
CodePudding user response:
For more than a few rows per name
in the table (looks to be so), I expect this query to be substantially faster, yet:
SELECT a.*
FROM unnest('{USD_RLS, EUR_RLS, SEKKEH_RLS, NIM_SEKKEH_RLS, ROB_SEKKEH_RLS
, BAHAR_RLS, GOLD_18_RLS, GOLD_OUNCE_USD, SILVER_OUNCE_USD
, PLATINUM_OUNCE_USD, GOLD_MESGHAL_RLS, GOLD_24_RLS, STOCK_IR
, AED_RLS, GBP_RLS, CAD_RLS, CHF_RLS
, TRY_RLS, AUD_RLS, JPY_RLS, CNY_RLS
, RUB_RLS, BTC_USD}'::text[]) AS n(name)
CROSS JOIN LATERAL (
SELECT *
FROM home.assets a
WHERE a.name = n.name
ORDER BY a.insert_time DESC
LIMIT 1
) a;
Pass your list as array, unnest, and then get each latest row in a LATERAL
subquery. The CROSS JOIN
eliminates names that are not found at all. (You might be interested in LEFT JOIN LATERAL ... ON true
instead, to keep those in the result.)
You still need the multicolumn index that Tim mentioned.
CREATE INDEX ON assets (name, insert_time DESC);
Default ascending sort order would work, too, in this case. Postgres can scan backwards:
CREATE INDEX ON assets (name, insert_time);
See:
- Postgres: getting latest rows for an array of keys
- Optimize GROUP BY query to retrieve latest row per user - basically type 2a
- What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
Not the number of rows in the table, but the number of rows per group (per name
in your case) decides whether DISTINCT ON
is the best choice. See this benchmark comparing relevant query styles: