How to startup old k8s cluster without losing pods and data?-CodePudding

I have a old k8s cluster with 1 master and 2 workers node. It was shutdown for a long time. Now I started it. It had many running pods and deployments. After restart the VM's all k8s command return

The connection to the server 123.70.70.70:6443 was refused - did you specify the right host or port?

What I have done so far? I saw many stack question to fix this error also on git and some other sites. All need kubeadm reset If I reset it I will loss all running pods. I don't know how to start those pods again as it was not deployed by me.

What I want? Is there a way I can make all the pods and nodes up and running without reset? Or even if I reset how can I get all the pods back in there running stage? This cluster was design and set it up by someone else I have no idea about its deployments.

CodePudding user response：

First let me explain about the error since you have restarted your servers or nodes (in kubernetes) if the IP address assigned to these nodes is not static the previous cluster configuration will not work and your cluster enters panic mode refer to this doc for making your cluster up and running.

Now as your cluster is up and running you can use kubectl commands for listing all the services, deployments and namespaces. Take a list of all these outputs and generate xml files and store them for backups.

If you are taking downtime and trying to restart your pods it won’t cause any data loss or application failure this document provides details on how to restart multiple pods at same time, but in general multiple restarts are not suggested, hope this addresses your query and if you can provide why are you planning to restart your cluster I can try to provide a more accurate solution.

CodePudding user response：

The error you are getting usually comes when the KUBECONFIG environment variable is not exported. Run the following commands as a regular user or run the last command as root.

sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf

Refer my SO answer here

Now that you are able to run kubectl commands, you should see any pods that are created as a control plane component or as a workload. Use following command to see the nodes as part of your cluster.

kubectl get nodes

Make sure to verify that all the control plane components are running fine as well

kubectl get pods -n kube-system

CodePudding user response：

Based on what you mentioned, the api server component on the cluster is not working as desired. This can be an issue of api server component starting in itself or it is failing to reach to the etcd component.

Login to the master node, based on the container runtime, check the containers if they are running well especially api server and etcd. If you do not see the containers running, use -a option to see them. For ex: In Docker use,

docker ps -a | grep api 
or 
docker ps -a | grep etcd

Once you find the container, get the logs of the container and it should give you the clue why your api server component is not starting up. Based on what you see, you can update your question with those log entries.