Home > database >  Using database transactions to ensure only one process picks up a job from jobs table
Using database transactions to ensure only one process picks up a job from jobs table

Time:10-10

I want to ensure that only one worker node (out of a cluster of nodes) picks up any given job from a Jobs table and processes it. I'm using the following database transaction on my Jobs table to achieve this:

BEGIN TRANSACTION;

SELECT id FROM jobs 
WHERE status = 'NEW' 
ORDER BY created_at
LIMIT 1;

-- rollback if id is null

UPDATE jobs 
SET status = 'IN_PROCESS' 
WHERE id = 'id';

COMMIT;

Will the above TRANSACTION ensure that only one node will pick up any given job? Is it possible that two nodes will read the SELECT statement simultaneously as NEW, and then both will run the UPDATE statement (after the first one releases the row lock) and start processing the same job?

In other words, will the TRANSACTION provide a lock for the SELECT statement as well, or only for the UPDATE statement?

CodePudding user response:

No, transactions won't help you here, unless you raise the isolation level to SERIALIZABLE. It's an option of last resort, but I would avoid it if possible.

The possibilities I see are:

  1. Pessimistic Locks. Add FOR UPDATE to the SELECT. These limit performance.

  2. Optimistic Locks. Seems like the best fit for your needs.

  3. Use a predefined queue manager linked to your table.

  4. Implement a queue, or a queue process.

Seems to me that #2 is the best fit for it. If that's the case you'll need to add an extra column to the table: version. With that column added your queries will need to be changed as:

SELECT id, version FROM jobs 
WHERE status = 'NEW' 
ORDER BY created_at
LIMIT 1;

UPDATE jobs 
SET status = 'IN_PROCESS', version = version   1 
WHERE id = 'id_retrieved_before' and version = 'version_retrieved_before';

The update above returns the number of updated rows. If the count is 1 then this thread got the row. If it's 0, then another competing thread got the row and you'll need to retry this strategy again.

As you can see:

  • No transactions are necessary.
  • All engines return the number of updated rows, so this strategy will work well virtually in any database.
  • No locks are necessary. This offers great performance.
  • The downside is that if another thread got the row, the logic needs to start again from the beginning. No big deal, but it doesn't ensure optimal response time in the scenario when there are many competing threads.

Finally, if your database is PostgreSQL, you can pack both SQL statements into a single one. Isn't PostgreSQL awesome?

  • Related