Home > other >  How do I optimize MYSQL select statements which execute 4000 times in total in a table with 50M rows
How do I optimize MYSQL select statements which execute 4000 times in total in a table with 50M rows

Time:11-08

This is how my code looks like:

foreach ($instruments as $instrument) {
    $stmt = $pdo->prepare("SELECT date, adjusted_close, close FROM ehd_historical_data WHERE exchange = ? AND symbol = ? AND date >= ? ORDER BY date asc LIMIT 1");
    $stmt->execute([xyzToExchange($instrument2), xyzToSymbol($instrument2), $startDate]);
    $data1 = $stmt->fetch(PDO::FETCH_ASSOC);

    $stmt = $pdo->prepare("SELECT date, adjusted_close, close FROM ehd_historical_data WHERE exchange = ? AND symbol = ? ORDER BY date desc LIMIT 1");
    $stmt->execute([xyzToExchange($instrument2), xyzToSymbol($instrument2)]);
    $data2 = $stmt->fetch(PDO::FETCH_ASSOC);
}

There are around 2000 instruments that are string in this format "NASDAQ:AAPL".

It currently takes 7 seconds to complete since the database has around 50 million rows.

So far:

  1. I have set INDEX for exchange, symbol and date together.

  2. Set another INDEX for exchange and symbol together.

I want to ask further what can I do to optimize this query.

Note:

  1. The function which this code is part of tries to find the price difference and the percent change between the start date and today's date. The start date can be anything like 6 months ago, 3 months ago.

  2. I tried merging them in one large query and then executing them. Still same problem.

Update:

  1. EXPLAIN for both queries

enter image description here enter image description here

  1. Table Schema
    CREATE TABLE `ehd_historical_data` (
    `exchange` varchar(255) NOT NULL,
    `symbol` varchar(255) NOT NULL,
    `date` date NOT NULL,
    `open` decimal(20,10) NOT NULL,
    `high` decimal(20,10) NOT NULL,
    `low` decimal(20,10) NOT NULL,
    `close` decimal(20,10) NOT NULL,
    `adjusted_close` decimal(20,10) NOT NULL,
    `volume` decimal(20,0) NOT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;


    ALTER TABLE `ehd_historical_data`
    ADD UNIQUE KEY `exchange_2` (`exchange`,`symbol`,`date`),
    ADD KEY `exchange` (`exchange`),
    ADD KEY `date` (`date`),
    ADD KEY `symbol` (`symbol`),
    ADD KEY `exchange_3` (`exchange`,`symbol`);
    COMMIT;

CodePudding user response:

Try selecting both rows in a single query using row_number()

select * 
from (
  SELECT date, adjusted_close, close,
    row_number() over(order by date desc) rn1,
    row_number() over(order by date asc) rn2
  FROM ehd_historical_data 
  WHERE exchange = ? AND symbol = ? AND date >= ? 
) t
where rn1 = 1 or rn2 = 1

You may also request all symbols at once. Note a partition clause

select * 
from (
  SELECT exchange, symbol, date, adjusted_close, close,
    row_number() over(partition by exchange, symbol order by date desc) rn1,
    row_number() over(partition by exchange, symbol order by date asc) rn2
  FROM ehd_historical_data 
  WHERE ((exchange = 'NASDAQ' AND symbol = 'AAPL') OR (exchange = 'NASDAQ' AND symbol = 'MSFT') OR (exchange = 'NASDAQ' AND symbol = 'TSLA')) AND date >= ? 
) t
where rn1 = 1 or rn2 = 1

CodePudding user response:

Your index (exchange,symbol,date) is optimal for both of those SELECTs, so let's dig into other causes for sluggishness.

CREATE TABLE `ehd_historical_data` (
`exchange` varchar(255) NOT NULL,   -- Don't use 255 if you don't need it
`symbol` varchar(255) NOT NULL,     -- ditto
`date` date NOT NULL,
`open` decimal(20,10) NOT NULL,     -- overkill
`high` decimal(20,10) NOT NULL,
`low` decimal(20,10) NOT NULL,
`close` decimal(20,10) NOT NULL,
`adjusted_close` decimal(20,10) NOT NULL,
`volume` decimal(20,0) NOT NULL
) ENGINE=InnoDB DEFAULT
   CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;  -- probably all are ascii


ALTER TABLE `ehd_historical_data`
ADD UNIQUE KEY `exchange_2` (`exchange`,`symbol`,`date`),  -- Change to PRIMARY KEY
ADD KEY `exchange` (`exchange`),  -- redundant, DROP
ADD KEY `date` (`date`),
ADD KEY `symbol` (`symbol`),
ADD KEY `exchange_3` (`exchange`,`symbol`);  -- redundant, DROP
COMMIT;

Have another table of symbols; use a MEDIUMINT UNSIGNED in this table.

decimal(20,10) takes 10 bytes; I know of no symbol that needs that much precision or range.

The above comments are aimed at making the table smaller. If the table is currently bigger than will fit in cache, I/O will be the cause of sluggishness.

How much RAM do you have? What is the value of `innodb_buffer_pool_size?

How fast does this run? (I'm thinking there might be a way to get all 2000 results in a single SQL. This might be a component of it.)

        SELECT  exchange, symbol,
                MIN(date) AS date1,
                MAX(date) AS date2
            FROM  ehd_historical_data 
            WHERE date > ?
            GROUP BY  exchange, symbol

That would be JOINed back to the table twice, once for each date.

  • Related