Home > Enterprise >  Kubernetes pods went to 0/2 during a node pool upgrade even though disruption budgets were set to mi
Kubernetes pods went to 0/2 during a node pool upgrade even though disruption budgets were set to mi

Time:03-18

I'm upgrading some AKS clusters for an app and have been testing out the az aks nodepool upgrade --max-surge flag to speed up the process. Our prod environment has 50 nodes, and at the clocked speed per node I have seen on our lowers I estimate prod will take 9 hours to complete. On one of the lower upgrades I ran a max surge at 50% which did help a little bit on speed, and all deployments kept a minimum available pods of 50%.

For this latest upgrade I tried out a max surge of 100%. Which spun up 6 new nodes(6 current nodes in the pool) on the correct version....but then it migrated every deployment/pod at the same time and took everything down to 0/2 pods. Before I started this process I made sure to have a pod disruption budget for every single deployment set at min available of 50%. This has worked on all of my other upgrades, except this one, which to me means the 100% surge is the cause.

I just can't figure out why my minimum available percentage was ignored. Below are the descriptions of an example PDB, and the corresponding deployment.

Pod disruption budget:

Name:           myapp-admin                                                                                            
Namespace:      front-svc                                                                                               
Min available:  50%                                                                                                     
Selector:       role=admin                                                                                              
Status:                                                                                                                 
    Allowed disruptions:  1                                                                                             
    Current:              2                                                                                             
    Desired:              1                                                                                             
    Total:                2                                                                                             
Events:   

Deployment(snippet):

Name:                   myapp-admin                                                                                   
Namespace:              front-svc                                                                                      
CreationTimestamp:      Wed, 26 May 2021 16:17:00 -0500                                                                
Labels:                 <none>                                                                                         
Annotations:            deployment.kubernetes.io/revision: 104                                                         
Selector:               agency=myorg,app=myapp,env=uat,organization=myorg,role=admin                                      
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable                                  
StrategyType:           RollingUpdate                                                                                  
MinReadySeconds:        15                                                                                             
RollingUpdateStrategy:  25% max unavailable, 1 max surge                                                               
Pod Template:                                                                                                          
  Labels:       agency=myorg                                                                                             
                app=myapp                                                                                             
                buildnumber=1234                                                                               
                env=uat                                                                                                
                organization=myorg                                                                                       
                role=admin                                                                                             
  Annotations:  kubectl.kubernetes.io/restartedAt: 2022-03-12T09:00:11Z                                                
  Containers:                                                                                                          
   myapp-admin-ctr: 

Is there something obvious I am doing wrong here?

CodePudding user response:

... a max surge value of 100% provides the fastest possible upgrade process (doubling the node count) but also causes all nodes in the node pool to be drained simultaneously.

From the official documentation. You may want to consider lower down your max surge.

  • Related