I want to use more than one app instance for my Java application. The aplication works with DB: writes and reads data. I'd like to use Hibernate with L1 caching (Session level) only. My question is: should I sync cache for each instance or there's no need to worry about synchronization of the caches?
CodePudding user response:
It's all depends on what is your application is about ?
Say you are running an e-commerce shop, in the admin panel, there is a service for managing products. Two separate user opens the same product page and update them. There is nothing wrong about it (unless you have some specific business case)
Another scenario is, you are tracking the inventory of products. Say you maintain a count of each product. When you add product, this count get increased and when you sell products this count get decreased. This operation is very sensitive and require some sort of locking. Without locking following scenario can happen
Timestamp | App Instance A | App Instance B |
---|---|---|
T1 | Reads and found 10 product | Reads and found 10 product |
T2 | Removes Two product and write 8 | Does nothing |
T3 | Done nothing | Add Two product and write 12 |
Thus it now tracks wrong count in the database.
To tackle these scenarios there are mostly two kind of locking mechanism
- Optimistic locking
- Pessimistic locking
To learn more about these sort of locking read here.
A simple way to implement the optimistic locking in hibernate is using the version
column in the database and application entity.
Here is a good article about the entity versioning.
CodePudding user response:
You can use caches like Hazelcast, Infinispan or EHCache that allow caching spanning multiple instances. These caches have different strategies, basically distributed (via a distributed hash table, DHT) or replicating. With a distributed cache only a subset of data is on each instance which leads to non uniform access times, however, its possible to cache a vast amount of data. With a replicating cache all data is replicated via the instances, so you get fast access times, but modification takes longer because all instances need to be notified.
To prevent a dirty read hibernate stops caching an object before the write transaction gets committed and then starts caching again. In case of a replicating cache this adds at least two network requests, so write throughput can decrease quite dramatically.
There are many details that should be understood and maybe tested before going in operation. Especially what happens when an instance is added or dies or is unreachable for an amount of time. When I looked at the hibernate code a few years back there was a hard coded timeout for a cache lockout of 30 seconds, meaning: If an instance disappears that was modifying data, other instances modifying the same data would "hang" for at most 30 seconds. But if the node did not die and it was just a connection problem and appears again after the timeout, you will get data inconsistencies. Within the caches you also have timers and recovery strategies for failure and connections problems that you need to understand and configure correctly depending on your operating environment.