Let's say we have an inventory system that tracks the available number of products in a shop (quantity). So we can have something similar to this:
Id | Name | Quantity |
---|---|---|
1 | Laptop | 10 |
We need to think about two things here:
- Be sure that
Quantity
is never negative - If we have simultaneous requests for a product we must ensure valid
Quantity
.
In other words, we can have:
request1
for 5 laptops (this request will be processed onthread1
)request2
for 1 laptop (this request will be processed onthread2
)
When both requests are processed, the database should contain
Id | Name | Quantity |
---|---|---|
1 | Laptop | 4 |
However, that might not be the case, depending on how we write our code. If on our server we have something similar to this:
var product = _database.GetProduct();
if (product.Quantity - requestedQuantity >= 0)
{
product.Quantity -= requestedQuantity;
_database.Save();
}
With this code, it's possible that both requests (that are executed on separate threads) would hit the first line of the code at the exact same time.
thread1
: _database.GetProduct(); // Quantity is 10thread2
: _database.GetProduct(); // Quantity is 10thread1
: _product.Quantity = 10 - 5 = 5thread2
: _product.Quantity = 10 - 1 = 9thread1
: _database.Save(); // Quantity is 5thread2
: _database.Save(); // Quantity is 9
What has just happened? We have sold 6 laptops, but we reduced just one from the inventory.
How to approach this problem?
To ensure only positive quantity we can use some DB constraints (to imitate unsigned int).
To deal with race condition we usually use lock
, and similar techniques.
And depending on a case that might work, if we have one instance of a server...But, what should we do when we have multiple instances of the server and the server is running on multithreading environment?
It seems to me that the moment you have more than one web server, your only reasonable option for locking is the database. Why do I say reasonable? Because we have Mutex
.
A lock
allows only one thread to enter the part that's locked and the lock is not shared with any other processes.
A mutex
is the same as a lock but it can be system-wide (shared by multiple processes).
Now...This is my personal opinion, but I expect that managing Mutex
between a few processes in microservice-oriented world where a new instance of the server can spin up each second or where the existing instance of the server can die each second is tricky and messy (Do we have some Github example?).
How to solve the problem then?
- Stored procedure* - offload the responsibility to the database. Write a new stored procedure and wrap the whole logic into a transaction. Each of the servers will call this SP and we don't need to worry about anything. But this might be slow?
- SELECT ...FOR UPDATE - I saw this while I was investigating the problem. With this approach, we still try to solve the problem on 'database' level.
Taking into account all of the above, what should be the best approach to solve this problem? Is there any other solution I am missing? What would you suggest?
I am working in .NET and using EF Core with PostgreSQL, but I think that this is really a language-agnostic question and that principle for solving the issue is similar in all environments (and similar for many relational databases).
CodePudding user response:
After reading the majority of the comments let's assume that you need a solution for a relational database.
The main thing that you need to guarantee is that the write operation at the end of your code only happens if the precondition is still valid (e.g. product.Quantity - requestedQuantity
).
This precondition is evaluated at the application side in memory. But the application only sees a snapshot of the data at the moment, when database read happened: _database.GetProduct();
This might become obsolete as soon as someone else is updating the same data. If you want to avoid using SERIALIZABLE
as a transaction isolation level (which has performance implications anyway), the application should detect at the moment of writing if the precondition is still valid. Or said differently, if the data is unchanged while it was working on it.
This can be done by using offline concurrency patterns: Either an optimistic offline lock or a pessimistic offline lock. Many ORM frameworks support these features by default.