Suppose I have a table containing various types of fruit and vegetables. Once every 5-10 minutes, I am checking the current price of all items I have listed, and I insert a new record for that timestamp with the current price.
The following is an example of such a table (number of rows reduced for readability, but imagine there are entries every 5 minutes for each fruit):
| slug | price | time |
-------- ------- ---------------------
| tomato | 1.5 | 2022-02-05 18:05:00 |
| tomato | 1.5 | 2022-02-05 21:05:00 |
| tomato | 1.55 | 2022-02-06 01:05:00 |
| tomato | 1.7 | 2022-02-06 08:05:00 |
| tomato | 1.65 | 2022-02-06 14:05:00 |
| tomato | 1.63 | 2022-02-07 02:05:00 |
| tomato | 1.69 | 2022-02-07 10:05:00 |
| tomato | 1.7 | 2022-02-07 18:05:00 |
| tomato | 1.49 | 2022-02-08 04:05:00 |
| tomato | 1.58 | 2022-02-08 18:05:00 |
| kiwi | 0.9 | 2022-02-05 18:05:00 |
| kiwi | 0.95 | 2022-02-05 21:05:00 |
| kiwi | 0.81 | 2022-02-06 01:05:00 |
| kiwi | 1.01 | 2022-02-06 08:05:00 |
| kiwi | 1.05 | 2022-02-06 14:05:00 |
| kiwi | 1.1 | 2022-02-07 02:05:00 |
| kiwi | 1.08 | 2022-02-07 10:05:00 |
| kiwi | 0.95 | 2022-02-07 18:05:00 |
| kiwi | 1.04 | 2022-02-08 04:05:00 |
| kiwi | 1.15 | 2022-02-08 18:05:00 |
| lemon | 1.69 | 2022-02-05 18:05:00 |
| lemon | 1.3 | 2022-02-05 21:05:00 |
| lemon | 1.35 | 2022-02-06 01:05:00 |
| lemon | 1.35 | 2022-02-06 08:05:00 |
| lemon | 1.3 | 2022-02-06 14:05:00 |
| lemon | 1.35 | 2022-02-07 02:05:00 |
| lemon | 1.41 | 2022-02-07 10:05:00 |
| lemon | 1.49 | 2022-02-07 18:05:00 |
| lemon | 1.5 | 2022-02-08 04:05:00 |
| lemon | 1.49 | 2022-02-08 18:05:00 |
Now, I want to select all items that I have, and show what their price was 24 hours ago. Using the following query, I can fetch a single entry per each such item that matches the "1 day ago" criteria:
SELECT slug, price, time
FROM items
WHERE time >= NOW() - INTERVAL 1 DAY
GROUP BY slug
The problem, however, is that over time the size of the table increases substantially, and this query, that used to take a fraction of a second, now takes 5-10 seconds (in my current table I have roughly 9 million records like these, with some additional columns of course).
Is there a more optimal way to execute this?
CodePudding user response:
I guess you want to display ....
- the slug, price, and time of
- the oldest row in the table
- that's newer than one day ago
- for each slug.
First, get the times of the records you want with this subquery.
SELECT MIN(time) time, slug
FROM items
WHERE time >= NOW() - INTERVAL 24 HOUR
GROUP BY slug
Then join that subquery to your table like this.
SELECT items.slug, items.price, items.time
FROM items
JOIN (
SELECT MIN(time) time, slug
FROM items
WHERE time >= NOW() - INTERVAL 1 DAY
GROUP BY slug
) h ON items.slug = h.slug AND items.time = h.time
ORDER BY slug
This index will help you make this fast. The server can jump immediately to the relevant rows by time.
ALTER TABLE CREATE INDEX timeslug (time, slug);