I have a MariaDB table with order events:
EVENTID | ORDERID | DATA | TIMESTAMP |
---|---|---|---|
1 | 1 | 'stuff1' | 2021-12-17 11:48:00.000 |
2 | 1 | 'newstuff1' | 2021-12-17 11:49:00.000 |
3 | 1 | 'newerstuff1' | 2021-12-17 11:49:30.000 |
4 | 2 | 'stuff2' | 2021-12-17 11:50:00.000 |
5 | 3 | 'stuff3' | 2021-12-17 11:51:10.000 |
6 | 3 | 'newstuff3' | 2021-12-17 11:52:00.000 |
And I need to get the most recent event for every order. So for this dataset the result should be
EVENTID | ORDERID | DATA | TIMESTAMP |
---|---|---|---|
3 | 1 | 'newerstuff1' | 2021-12-17 11:49:30.000 |
4 | 2 | 'stuff2' | 2021-12-17 11:50:00.000 |
6 | 3 | 'newstuff3' | 2021-12-17 11:52:00.000 |
I am achieving this using a correlated subquery:
SELECT *
FROM MESSAGES m1
WHERE TIMESTAMP = (SELECT MAX(TIMESTAMP)
FROM MESSAGES m2 WHERE m1.ORDERID = m2.ORDERID);
But this is slow for large volumes so I'd like to know if there is a way to improve the query using a join or grouping. Otherwise I'm open to splitting the data into different tables if that would help.
CodePudding user response:
SELECT *
FROM table t1
WHERE NOT EXISTS ( SELECT NULL
FROM table t2
WHERE t1.orderid = t2.orderid
AND t1.`timestamp` < t2.`timestamp` )
I.e. simply select the row if there is no another row with the same order id and with more recent date.
PS. The index by (orderid, `timestamp`)
will improve.
CodePudding user response:
SELECT m3.*
FROM ( SELECT ORDERID, MAX(TIMESTAMP) AS TIMESTAMP
FROM MESSAGES m1
GROUP BY ORDERID ) m2
JOIN MESSAGES m3 USING(ORDERID, TIMESTAMP);
and have
INDEX(ORDERID, TIMESTAMP)
The query is a "groupwise-max" pattern.
The derived table (subquery) will hop through the index, then look up the rest of the columns from the table.
If there are any duplicate timestamps for a given order id, it will deliver multiple rows. If that is not desired, see this for more options.