How to decline a request, if another one is already processed for the same user-id?-CodePudding

I am trying to implement some kind of sync-service. Two clients with different user-agents may POST/PATCH to /sync/user/{user_id}/resource at the same time with the same user_id. sync should update data for user with id={user_id} in DB.

func (syncServer *SyncServer) Upload(w http.ResponseWriter, r *http.Request, ps httprouter.Params) {
    userID := ps.ByName("user_id"))
    if isAlreadyProcessedForUser(userID) {
       w.WriteHeader(http.StatusConflict)
       return
    }
    ...
    syncServer.db.Update(userID, data)
    ...

}

The problem is I have no idea how to correctly decline one Upload when another one is still processing request for the same user_id. I think the idea to use mutex.Lock() is bad because I will use many pods for this handler and if Upload is called on different pods, it won't help me out anyway. What a synchronization method can I use to solve this problem? Should I use some additional field in DB? I am asking to give me any idea!

CodePudding user response：

There're many ways to do this (distributed locking) in a distributed system, some I can come up with by far:

Use a redis (or any other similar services) lock . Then you can lock each user_id on receiving the first request and reject other requests for the same user_id because you'll failed to lock it. Redis lock generally has expiration time so you won't deadlock it. Ref: https://redis.io/docs/reference/patterns/distributed-locks/
Use a database lock. You should be careful to use a database lock, but a simple way to do this is with unique index: Create a uploading record with unique(user_id) constraints before upload, and delete it after upload. It's possible to forget/failed to delete the record and cause deadlock, so you might want to add another expired_at field to the record, check & drop it before uploading.
(Specific to the question's scenario) Use a unique constraints on (user_id, upload_status). This is called partial index, and you'll only check this unique index when upload_stats = 'uploading'. Then you can create an uploading record on each request, and reject the other request. Expiration is also needed so you need to track the start_time of uploading and cleanup long-time uploading record. If you don't need to re-claim the disk space you can simply mark the record as failed, by this you can also track when & how these uploads failed in database.

CAUTION:

It seems that you're using Kubernetes, so any non-distributed lock should be used cautiously, depends on the level of consistency you want to acquire. Pods are volatile and it's hard to rely on local information and achieve consistency because they might be duplicated/killed/re-scheduled to another machine. This also applies to any other platforms with auto scaling or scheduling mechanisms.
A syncing process between several clients owned by one user and server needs to handle at least the request ordering, request deduplicating, and eventual consistency issue (e.g. Google Doc can support many people editing at the same time). There're some generic algorithms (like operational transformation) but it depends on your specific use case.