Home > Software engineering >  Running a cron to update 1 million records in every hour fails
Running a cron to update 1 million records in every hour fails

Time:09-22

We have an E-commerce system with more than 1 million users with a total or 4 to 5 million records in order table. We use codeigniter framework as back end and Mysql as database.

Due to this excessive number of users and purchases, we use cron jobs to update the order details and referral bonus points in every hour to make the things work.

Now we have a situation that these data updates exceeds one hour and the next batch of updates reach before finishing the previous one, there by leading into a deadlock and failure of the system.

I'd like to know about the different possible architectural and database scaling options and suggestions to get rid of this situation. We are using only the monolithic architecture to run this application.

CodePudding user response:

  • Don't use cron. Have a single process that starts over when it finishes. If one pass lasts more than an hour, the next one will start late. (Checking PROCESSLIST is clumsy and error-prone. OTOH, this continually-running approach needs a "keep-alive" cronjob.)

  • Don't UPDATE millions of rows. Instead, find a way to put the desired info in a separate table that the user joins to. Presumably, that extra table would only 1 row (if everyone is controlled by the same game) or a small number of rows (if there are only a small number of patterns to handle).

  • Do have the slowlog turned on, with a small value for long_query_time (possibly "1.0", maybe lower). Use pt-query-digest to summarize it to find the "worst" queries. Then we can help you make them take less time, thereby helping to calm your busy system and improve the 'user experience'.

  • Do use batched INSERT. (A one INSERT with 100 rows runs about 10 times as fast as 100 single-row INSERTs.) Batching UPDATEs is tricky, but can be done with IODKU.

  • Do use batches of 100-1000 rows. (This is somewhat optimal considering the various things that can happen.)

  • Do use transactions judiciously. Do check for errors (including deadlocks) at every step.

  • Do tell us what you are doing in the hourly update. We might be able to provide more targeted advice than that 15-year-old book.

  • Do realize that you have scaled beyond the capabilities of the typical 3rd-party package. That is, you will have to learn the details of SQL.

CodePudding user response:

I have some ideas here for you - mixed up with some questions.

Assuming you are limited in what you can do (i.e. you can't re-architect you way out of this) and that the database can't be tuned further:

  • Make the list of records to be processed as small as possible

i.e. Does the job have to run over all records? These 4-5 million records - are they all active orders, or that's how many you have in total for all time? Obviously just process the bare minimum.

  • Split and parallel process

You mentioned "batches" but never explained what that meant - can you elaborate?

Can you get multiple instances of the cron job to run at once, each covering a different segment of the records?

  • Multi-Record Operations

The easy (lazy) way to program updates is to do it in a loop that iterates through each record and processes it individually, but relational databases can do updates over multiple records at once. I'm pretty sure there's a proper term for that but I can't recall it. Are you processing each row individually or doing multi-record updates?

How does the cron job query the database? Have you hand-crafted the most efficient queries possible, or are you using some ORM / framework to do stuff for you?

  • Related