Optimize correlated subquery for order events table-CodePudding

I have a MariaDB table with order events:

EVENTID	ORDERID	DATA	TIMESTAMP
1	1	'stuff1'	2021-12-17 11:48:00.000
2	1	'newstuff1'	2021-12-17 11:49:00.000
3	1	'newerstuff1'	2021-12-17 11:49:30.000
4	2	'stuff2'	2021-12-17 11:50:00.000
5	3	'stuff3'	2021-12-17 11:51:10.000
6	3	'newstuff3'	2021-12-17 11:52:00.000

And I need to get the most recent event for every order. So for this dataset the result should be

EVENTID	ORDERID	DATA	TIMESTAMP
3	1	'newerstuff1'	2021-12-17 11:49:30.000
4	2	'stuff2'	2021-12-17 11:50:00.000
6	3	'newstuff3'	2021-12-17 11:52:00.000

I am achieving this using a correlated subquery:

SELECT *
FROM MESSAGES m1
WHERE TIMESTAMP = (SELECT MAX(TIMESTAMP) 
FROM MESSAGES m2 WHERE m1.ORDERID = m2.ORDERID);

But this is slow for large volumes so I'd like to know if there is a way to improve the query using a join or grouping. Otherwise I'm open to splitting the data into different tables if that would help.

CodePudding user response：

SELECT *
FROM table t1
WHERE NOT EXISTS ( SELECT NULL
                   FROM table t2
                   WHERE t1.orderid = t2.orderid
                     AND t1.`timestamp` < t2.`timestamp` )

I.e. simply select the row if there is no another row with the same order id and with more recent date.

PS. The index by (orderid, `timestamp`) will improve.

CodePudding user response：

SELECT m3.*
    FROM ( SELECT ORDERID, MAX(TIMESTAMP) AS TIMESTAMP
              FROM MESSAGES m1
              GROUP BY ORDERID ) m2
    JOIN MESSAGES m3 USING(ORDERID, TIMESTAMP);

and have

INDEX(ORDERID, TIMESTAMP)

The query is a "groupwise-max" pattern.

The derived table (subquery) will hop through the index, then look up the rest of the columns from the table.

If there are any duplicate timestamps for a given order id, it will deliver multiple rows. If that is not desired, see this for more options.