Availability with Kubernetes-CodePudding

We run an internal a healthcheck of the service every 5 seconds. And we run Kubernetes liveness probes every 1 second. So in the worst scenario the Kubernetes loadbalancer has up-to-date information every 6 seconds.

My question is what happens when a client request hits a pod which is broken but not seen by the loadbalancer as unhealthy? Should the client implement a logic with retries? Or should we implement backend logic to handle the cases when a request hits a pod which is not yet seen as unhealthy by the loadbalancer?

CodePudding user response：

Not sure how your architecture is however LoadBalancers are generally set with the ingress controller like Nginx and etc.

Load Balancer backed by the ingress controller forwards the traffic to the K8s service, and the K8s service mostly manages the request routing to PODs, not LB.

Based on the Readiness K8s service route the request to PODs, so if your POD is NotReady, the request won't reach there. Due to any delay if the request reaches to that POD there could be a chance you get internal error or so in return.

Retries

yes, you implement the retries at the client side but if you are on K8s, you can offload the retries part to the service mesh. it would be easy to maintain and integrate retries logic with the K8s and service mesh.

You can use the service mesh like Istio and implement the retries policy at virtual service level

retries:
      attempts: 5
      retryOn: 5xx

Virtual service

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
  - ratings
  http:
  - route:
    - destination:
        host: ratings
        subset: v1
    retries:
      attempts: 3
      perTryTimeout: 2s