I have a cron job set up on the Kubernetes cluster to process millions of records from the database. Sometimes pod corresponding to the cron job gets Evicted or OOM Killed. Now the issue I am facing is whenever this cron job starts again it processes all those records from the beginning.
Just wanted to understand how should I approach storing the progress of this cron job somewhere. Let's say I store it in a database then how frequent should I make a db call to store the state?
CodePudding user response:
Maintaining progress of the CronJob
You can check the job by running kubectl describe <your_job>
, but this could be not a solution for your situation.
Now the issue I am facing is whenever this cron job starts again it processes all those records from the beginning.
This is for the correct operation of CronJob. This is for the correct operation of CronJob. You need to know that CronJob only performs certain tasks in a timely manner and does not interfere with them anymore. The solution to your problem will be to interfere with your script which is run by CronJob. User Rakesh Gupta has good mentioned in the comment:
Try to base your next iteration on the timestamp or UUID of the rows fetched already
Generally you have to change your process which is working on the database. You can actually use timestamp or UUID for this. Basically you need to find an identifier that you can easily check before running your process. Then your process will start running from a specific location instead of all over again. Another solution may be to increase the available RAM if the process dies through OOM.
CodePudding user response:
i know i am a bit late to the party, suggestion from Rakesh gupta & Mikolaj are pretty good.
You either extend the resource limit or use DB.
i am not sure about the architecture of actual app which you have, you can also use the Redis database or Redis deployment as a side option.
When your cronjobs run it's dump the records to Redis and cronjob process one by one records from the Queue inside the Redis database. This is a good option as not many Db calls will be there to the main database.
i am not sure you are on which language but this library is a good example to use with Redis and manage Queue : https://github.com/OptimalBits/bull
Using this you can manage the Redis Queue and process the records with minimal DB calls and changes.