Home > front end >  How to get a single record per group from one day ago in a very large MySQL table?
How to get a single record per group from one day ago in a very large MySQL table?

Time:02-14

Suppose I have a table containing various types of fruit and vegetables. Once every 5-10 minutes, I am checking the current price of all items I have listed, and I insert a new record for that timestamp with the current price.

The following is an example of such a table (number of rows reduced for readability, but imagine there are entries every 5 minutes for each fruit):

| slug   | price | time                |
 -------- ------- --------------------- 
| tomato | 1.5   | 2022-02-05 18:05:00 |
| tomato | 1.5   | 2022-02-05 21:05:00 |
| tomato | 1.55  | 2022-02-06 01:05:00 |
| tomato | 1.7   | 2022-02-06 08:05:00 |
| tomato | 1.65  | 2022-02-06 14:05:00 |
| tomato | 1.63  | 2022-02-07 02:05:00 |
| tomato | 1.69  | 2022-02-07 10:05:00 |
| tomato | 1.7   | 2022-02-07 18:05:00 |
| tomato | 1.49  | 2022-02-08 04:05:00 |
| tomato | 1.58  | 2022-02-08 18:05:00 |
| kiwi   | 0.9   | 2022-02-05 18:05:00 |
| kiwi   | 0.95  | 2022-02-05 21:05:00 |
| kiwi   | 0.81  | 2022-02-06 01:05:00 |
| kiwi   | 1.01  | 2022-02-06 08:05:00 |
| kiwi   | 1.05  | 2022-02-06 14:05:00 |
| kiwi   | 1.1   | 2022-02-07 02:05:00 |
| kiwi   | 1.08  | 2022-02-07 10:05:00 |
| kiwi   | 0.95  | 2022-02-07 18:05:00 |
| kiwi   | 1.04  | 2022-02-08 04:05:00 |
| kiwi   | 1.15  | 2022-02-08 18:05:00 |
| lemon  | 1.69  | 2022-02-05 18:05:00 |
| lemon  | 1.3   | 2022-02-05 21:05:00 |
| lemon  | 1.35  | 2022-02-06 01:05:00 |
| lemon  | 1.35  | 2022-02-06 08:05:00 |
| lemon  | 1.3   | 2022-02-06 14:05:00 |
| lemon  | 1.35  | 2022-02-07 02:05:00 |
| lemon  | 1.41  | 2022-02-07 10:05:00 |
| lemon  | 1.49  | 2022-02-07 18:05:00 |
| lemon  | 1.5   | 2022-02-08 04:05:00 |
| lemon  | 1.49  | 2022-02-08 18:05:00 |

Now, I want to select all items that I have, and show what their price was 24 hours ago. Using the following query, I can fetch a single entry per each such item that matches the "1 day ago" criteria:

SELECT slug, price, time
FROM items 
WHERE time >= NOW() - INTERVAL 1 DAY
GROUP BY slug

The problem, however, is that over time the size of the table increases substantially, and this query, that used to take a fraction of a second, now takes 5-10 seconds (in my current table I have roughly 9 million records like these, with some additional columns of course).

Is there a more optimal way to execute this?

CodePudding user response:

I guess you want to display ....

  • the slug, price, and time of
  • the oldest row in the table
  • that's newer than one day ago
  • for each slug.

First, get the times of the records you want with this subquery.

         SELECT MIN(time) time, slug
           FROM items
          WHERE time >= NOW() - INTERVAL 24 HOUR
          GROUP BY slug

Then join that subquery to your table like this.

SELECT items.slug, items.price, items.time
  FROM items
  JOIN (
         SELECT MIN(time) time, slug
           FROM items
          WHERE time >= NOW() - INTERVAL 1 DAY
          GROUP BY slug
       ) h ON items.slug = h.slug AND items.time = h.time
 ORDER BY slug

This index will help you make this fast. The server can jump immediately to the relevant rows by time.

ALTER TABLE CREATE INDEX timeslug (time, slug);
  • Related