celery does not detect closed broker connections and freezes after a while-CodePudding

I have celery==3.1.16 with Python 2.7 and it has some tasks and its broker is Redis. It is good at the beginning of the performance, but it freezes after a while. I checked TCP connections in the container and I see this:

tcp        1      0 WORKER:34472 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39884 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57292 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:60030 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:39906 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57102 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:34508 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39874 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57182 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57106 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39870 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57056 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39902 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:34494 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:40878 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:39878 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:53138 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:43818 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:39876 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:50586 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:59800 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57128 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57238 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57346 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57050 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39896 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:44850 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57124 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39904 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:39872 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:57160 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57190 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:59724 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57260 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:34492 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:59740 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:55426 REDIS_HOST:6379 CLOSE_WAIT

I think the celery worker does not detect closed connections to re-established them and I have to reset the worker pod periodically to work again.

Any idea?

CodePudding user response：

I run celery with solo pool-size mode and use multiple Kubernetes pods instead of one pod because the solo mode is single-thread, so the problem is solved.

https://docs.celeryproject.org/en/stable/internals/reference/celery.concurrency.solo.html