Home > Software design >  How to speed up queries with index design for a certain table?
How to speed up queries with index design for a certain table?

Time:06-14

Forgive me I am very new to indexes and DB. I have a table with such columns :

    date        timestamp without time zone,
    date_num                         bigint,
    value                  double precision,
    name                               text,
    market                             text,
    type                               text,
    UNIQUE(name,date_num)

This table is :

  1. Barely updated, only once a day with a single new row per name.
  2. Have 6-10 million rows
  3. For column name have many rows each for a unique day of the year. For example the name 'companyA' have 1250 rows, each row has a unique date/date_num.
  4. date_num is a timestamp in millisecond for a certain day of the year we use it for search, sometimes we use date.

One of the things we search is "find names with the highest revenue between dates", which mean for a name = "companyA" we will calculate:

revenue = (`value` of companyA in 14/2/2022 - `value` of companyA in 14/2/2021)

We then need to find the best 50 names with the highest such revenue.

For some reason the task takes up to 13 seconds, and I'v seen others doing this in 1 second.

  1. What indexes are reasonable for such scenario?
  2. What indexes are good in case we need to find/calculate many variations of date/name/value like this ?

Here is a query that need to find revenue of all names of a certain type. This query is not ideal because it serves many types of queries and originally contains parameters to change the sql string per query

        WITH BS AS (
        SELECT date_num, name, value,
        first_value(value) over (PARTITION BY name ORDER BY date_num) as o,
        first_value(value) over (PARTITION BY name ORDER BY date_num DESC) as c,
        ROW_NUMBER() OVER (PARTITION BY name ORDER BY date_num DESC) as rn
        FROM historical
        WHERE date_num >= 1609459200 AND date_num <= 1640995200 AND type = 'typeA'
        )
        SELECT name, date_num, CASE WHEN o=0 THEN null ELSE 100 * ( (c - o)/o ) END as out_return
        FROM BS
        WHERE BS.rn = 1
        ORDER BY out_return  DESC NULLS LAST
        LIMIT 50

PS- the type column has a certain value for 95% of the table, and only another value for the other 5%.

CodePudding user response:

The best you can do to index this query is an index on (type, date_num). Don't store timestamps as numbers.

  • Related