Home > Enterprise >  Is there an indicator from a Kubernetes Cluster that can indicate whether that cluster has been upgr
Is there an indicator from a Kubernetes Cluster that can indicate whether that cluster has been upgr

Time:04-12

I'm trying to find some sort of signal from a cluster indicating that there has been some sort of change with a Kubernetes cluster. I'm looking for any change that could cause issues with software running on that cluster such as Kubernetes version change, infra/distro/layout change, etc.

The only signal that I have been able to find is a node restart, but this can happen for any number of reasons - I'm trying to find something a bit stronger than this. I am preferably looking for something platform agnostic as well.

CodePudding user response:

In addition to watching Node events (see the complete list of events here), you can use Kubernetes' Node Problem Detector for monitoring and reporting about a node's health (link).

There are tons of node problems that could possibly affect the pods running on the node, such as:

  • Infrastructure daemon issues: ntp service down;
  • Hardware issues: Bad CPU, memory or disk;
  • Kernel issues: Kernel deadlock, corrupted file system;
  • Container runtime issues: Unresponsive runtime daemon;

Node-problem-detector collects node problems from various daemons and make them visible to the upstream layers.

Node-problem-detector supports several exporters:

  • Kubernetes exporter reports node problems to Kubernetes API server: temporary problems get reported as Events, and permanent problems get reported as Node Conditions.
  • Prometheus exporter.
  • Stackdriver Monitoring API.

Another option is the Prometheus Node Exporter (link). It exposes a wide variety of hardware- and kernel-related metrics (OS release info, system information as provided by the 'uname' system call, memory statistics, disk IO statistics, NFS statistics, etc.).

Check the list of all existing collectors and the supported systems here.

CodePudding user response:

From a pure Kubernetes perspective, I think the best you can do is monitor Node events (such as drain, reboot, etc) and then check to see of the version of the node has actually changed. You may also be able to watch Node resources and check to see if the version has changed as well.

For GKE specifically, you can actually set up cluster notifications and then subscribe to the UpgradeEvent and/or UpgradeAvailableEvent.

I believe AKS may have recently introduced support for events as well, although I believe it currently only supports something similar to the UpgradeAvailableEvent.

  • Related