Kubernetes pod fails with "Unable to attach or mount volumes"-CodePudding

After Kubernetes upgrade from 1.18.13 to 1.19.5 I get error bellow for some pods randomly. After some time pod fails to start(it's a simple pod, doesn't belong to deployment)

  Warning  FailedMount  99s   kubelet  Unable to attach or mount volumes: unmounted volumes=[red-tmp data logs docker src red-conf], unattached volumes=[red-tmp data logs docker src red-conf]: timed out waiting for the condition

On 1.18 we don't have such issue, also during upgrade K8S doesn't show any errors or incompatibility messages.
No additional logs from any other K8S components(tried to increase verbosity level for kubelet)
Everything is fine with disk space and other host's metrics like LA, RAM
No network storages, only local data
PV and PVC are created before pods and we don't change them
Tried to use higher K8S versions but no luck

We have pretty standard setup without any special customizations:

CNI: Flannel
CRI: Docker
Only one node as master and worker
16 cores and 32G RAM

Example of pod definition:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: provision
    ver: latest
  name: provision
  namespace: red
spec:
  containers:
  - args:
    - wait
    command:
    - provision.sh
    image: app-tests
    imagePullPolicy: IfNotPresent
    name: provision
    volumeMounts:
    - mountPath: /opt/app/be
      name: src
    - mountPath: /opt/app/be/conf
      name: red-conf
    - mountPath: /opt/app/be/tmp
      name: red-tmp
    - mountPath: /var/lib/app
      name: data
    - mountPath: /var/log/app
      name: logs
    - mountPath: /var/run/docker.sock
      name: docker
  dnsConfig:
    options:
    - name: ndots
      value: "2"
  dnsPolicy: ClusterFirst
  enableServiceLinks: false
  restartPolicy: Never
  volumes:
  - hostPath:
      path: /opt/agent/projects/app-backend
      type: Directory
    name: src
  - name: red-conf
    persistentVolumeClaim:
      claimName: conf
  - name: red-tmp
    persistentVolumeClaim:
      claimName: tmp
  - name: data
    persistentVolumeClaim:
      claimName: data
  - name: logs
    persistentVolumeClaim:
      claimName: logs
  - hostPath:
      path: /var/run/docker.sock
      type: Socket
    name: docker

apiVersion: v1
kind: PersistentVolume
metadata:
  name: red-conf
  labels:
    namespace: red
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 2Gi
  hostPath:
    path: /var/lib/docker/k8s/red-conf
  persistentVolumeReclaimPolicy: Retain
  storageClassName: red-conf
  volumeMode: Filesystem

PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: conf
  namespace: red
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 2Gi
  storageClassName: red-conf
  volumeMode: Filesystem
  volumeName: red-conf

tmp data logs pv have the same setup as conf beside path. They have separate folders:

/var/lib/docker/k8s/red-tmp
/var/lib/docker/k8s/red-data
/var/lib/docker/k8s/red-logs

Currently I don't have any clues how to diagnose the issue :(

Would be glad to get advice. Thanks in advance.

CodePudding user response：

you must be using local volumes. Follow the below link to understand how to create storage class, pv's and pvc when you use local volumes.

https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/

CodePudding user response：

I recommend you to start troubleshooting by reviewing the VolumeAttachment events against what node has tied the PV, perhaps your volume is still linked to a node that was in evicted condition and was replaced by a new one.

You can use this command to check your PV name and status:

kubectl get pv

And then, to review what node has the correct volumeattachment, you can use the following command:

kubectl get volumeattachment

Once you get the name of your PV and at what node it is attached, then you will be able to see if the PV is tied to the correct node or if maybe it is tied to a previous node that is not working or was removed. The node gets evicted and scheduled into a new available node from the pool; to know what nodes are ready and running, you can use this command:

kubectl get nodes

If you detect that your PV is tied to the node that no longer exists, you will need to delete the VolumeAttachment with the following command:

kubectl delete volumeattachment [csi-volumeattachment_name]

If you need to review a detailed guide for this troubleshooting, you can follow this link.