I am having difficulty getting a kubernetes livenessProbe exec command to work with environment variables. My goal is for the liveness probe to monitor memory usage on the pod as well as also perform an httpGet health check.
"If container memory usage exceeds 90% of the resource limits OR the http response code at /health
fails then the probe should fail."
The liveness probe is configured as follows:
livenessProbe:
exec:
command:
- sh
- -c
- |-
"used=$(awk '{ print int($1/1.049e 6) }' /sys/fs/cgroup/memory/memory.usage_in_bytes);
thresh=$(awk '{ print int( $1 / 1.049e 6 * 0.9 ) }' /sys/fs/cgroup/memory/memory.limit_in_bytes);
health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/health);
if [[ ${used} -gt ${thresh} || ${health} -ne 200 ]]; then exit 1; fi"
initialDelaySeconds: 240
periodSeconds: 60
failureThreshold: 3
timeoutSeconds: 10
If I exec into the (ubuntu) pod and run these commands they all work fine and do the job.
But when deployed as a livenessProbe the pod is constantly failing with the following warning:
Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning Unhealthy 14m (x60 over 159m) kubelet (combined from similar events): Liveness probe failed: sh: 4: used=1608; │
│ thresh=2249; │
│ health=200; │
│ if [[ -gt || -ne 200 ]]; then exit 1; fi: not found
It looks as if the initial commands to probe memory and curl the health check endpoint all worked and populated environment variables but then those variable substitutions did not subsequently populate in the if statement so the probe never passes.
Any idea as to why? Or how this could be configured to work properly? I know it's a little bit convoluted. Thanks in advance.
CodePudding user response:
Looks like the shell is seeing your whole command as a filename to execute.
I would remove the outer quotes
livenessProbe:
exec:
command:
- sh
- -c
- |-
used=$(awk '{ print int($1/1.049e 6) }' /sys/fs/cgroup/memory/memory.usage_in_bytes);
thresh=$(awk '{ print int( $1 / 1.049e 6 * 0.9 ) }' /sys/fs/cgroup/memory/memory.limit_in_bytes);
health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/health);
if [[ ${used} -gt ${thresh} || ${health} -ne 200 ]]; then exit 1; fi
initialDelaySeconds: 240
periodSeconds: 60
failureThreshold: 3
timeoutSeconds: 10
You're already telling the YAML parser it's a multiline string
CodePudding user response:
I think the root of your issues is the confusion between bash
and sh
(shell). Both are widely available in containers (but bash is sometimes not present) but bash has more features. Here you use [[
which is specific to bash, sh does not know it and may cause unwanted behavior.
First with replace sh
by bash
in your command if it is present in the container. If not you will have to use shell syntax to do conditional commands.
Then your liveness probe can be perfected by leveraging other Kubernetes features:
To avoid a big initial delay, use a Startup probe. It will disable other probes until it responds with one success and should have a high failureThreshold. It allows flexibility in case the container starts faster than expected and centralize the delay (which means no value duplication) when you add an other probe.
Use the
resources
field. It allows you to specify memory and CPU limits and requests (read the documentation) for a specific deployment or pod. Because failing the liveness probe means that your pod will be restarted, setting a limit will do the same thing but cleaner.