Home > Blockchain >  Optimize correlated subquery for order events table
Optimize correlated subquery for order events table

Time:12-18

I have a MariaDB table with order events:

EVENTID ORDERID DATA TIMESTAMP
1 1 'stuff1' 2021-12-17 11:48:00.000
2 1 'newstuff1' 2021-12-17 11:49:00.000
3 1 'newerstuff1' 2021-12-17 11:49:30.000
4 2 'stuff2' 2021-12-17 11:50:00.000
5 3 'stuff3' 2021-12-17 11:51:10.000
6 3 'newstuff3' 2021-12-17 11:52:00.000

And I need to get the most recent event for every order. So for this dataset the result should be

EVENTID ORDERID DATA TIMESTAMP
3 1 'newerstuff1' 2021-12-17 11:49:30.000
4 2 'stuff2' 2021-12-17 11:50:00.000
6 3 'newstuff3' 2021-12-17 11:52:00.000

I am achieving this using a correlated subquery:

SELECT *
FROM MESSAGES m1
WHERE TIMESTAMP = (SELECT MAX(TIMESTAMP) 
FROM MESSAGES m2 WHERE m1.ORDERID = m2.ORDERID);

But this is slow for large volumes so I'd like to know if there is a way to improve the query using a join or grouping. Otherwise I'm open to splitting the data into different tables if that would help.

CodePudding user response:

SELECT *
FROM table t1
WHERE NOT EXISTS ( SELECT NULL
                   FROM table t2
                   WHERE t1.orderid = t2.orderid
                     AND t1.`timestamp` < t2.`timestamp` )

I.e. simply select the row if there is no another row with the same order id and with more recent date.

PS. The index by (orderid, `timestamp`) will improve.

CodePudding user response:

SELECT m3.*
    FROM ( SELECT ORDERID, MAX(TIMESTAMP) AS TIMESTAMP
              FROM MESSAGES m1
              GROUP BY ORDERID ) m2
    JOIN MESSAGES m3 USING(ORDERID, TIMESTAMP);

and have

INDEX(ORDERID, TIMESTAMP)

The query is a "groupwise-max" pattern.

The derived table (subquery) will hop through the index, then look up the rest of the columns from the table.

If there are any duplicate timestamps for a given order id, it will deliver multiple rows. If that is not desired, see this for more options.

  • Related