I learnt that to run a container as rootless, you need to specify either the SecurityContext:runAsUser 1000 or specify the USER directive in the DOCKERFILE.
Question on this is that there is no UID 1000 on the Kubernetes/Docker host system itself.
I learnt before that Linux User Namespacing allows a user to have a different UID outside it's original NS.
Hence, how does UID 1000 exist under the hood? Did the original root (UID 0) create a new user namespace which is represented by UID 1000 in the container?
What happens if we specify UID 2000 instead?
CodePudding user response:
Hope this answer helps you
I learnt that to run a container as rootless, you need to specify either the SecurityContext:runAsUser 1000 or specify the USER directive in the DOCKERFILE
You are correct except in runAsUser: 1000
. you can specify any UID, not only 1000
. Remember any UID you want to use (runAsUser: UID
), that UID
should already be there!
Often, base images will already have a user created and available but leave it up to the development or deployment teams to leverage it. For example, the official Node.js image comes with a user named node at UID 1000
that you can run as, but they do not explicitly set the current user to it in their Dockerfile. We will either need to configure it at runtime with a runAsUser
setting or change the current user in the image using a derivative Dockerfile
.
runAsUser: 1001 # hardcode user to non-root if not set in Dockerfile
runAsGroup: 1001 # hardcode group to non-root if not set in Dockerfile
runAsNonRoot: true # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000
Remmeber that runAsUser
and runAsGroup
ensures container processes do not run as the root
user but don’t rely on the runAsUser
or runAsGroup
settings to guarantee this. Be sure to also set runAsNonRoot: true
.
Here is full example of securityContext
:
# generic pod spec that's usable inside a deployment or other higher level k8s spec
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
# basic container details
- name: my-container-name
# never use reusable tags like latest or stable
image: my-image:tag
# hardcode the listening port if Dockerfile isn't set with EXPOSE
ports:
- containerPort: 8080
protocol: TCP
readinessProbe: # I always recommend using these, even if your app has no listening ports (this affects any rolling update)
httpGet: # Lots of timeout values with defaults, be sure they are ideal for your workload
path: /ready
port: 8080
livenessProbe: # only needed if your app tends to go unresponsive or you don't have a readinessProbe, but this is up for debate
httpGet: # Lots of timeout values with defaults, be sure they are ideal for your workload
path: /alive
port: 8080
resources: # Because if limits = requests then QoS is set to "Guaranteed"
limits:
memory: "500Mi" # If container uses over 500MB it is killed (OOM)
#cpu: "2" # Not normally needed, unless you need to protect other workloads or QoS must be "Guaranteed"
requests:
memory: "500Mi" # Scheduler finds a node where 500MB is available
cpu: "1" # Scheduler finds a node where 1 vCPU is available
# per-container security context
# lock down privileges inside the container
securityContext:
allowPrivilegeEscalation: false # prevent sudo, etc.
privileged: false # prevent acting like host root
terminationGracePeriodSeconds: 600 # default is 30, but you may need more time to gracefully shutdown (HTTP long polling, user uploads, etc)
# per-pod security context
# enable seccomp and force non-root user
securityContext:
seccompProfile:
type: RuntimeDefault # enable seccomp and the runtimes default profile
runAsUser: 1001 # hardcode user to non-root if not set in Dockerfile
runAsGroup: 1001 # hardcode group to non-root if not set in Dockerfile
runAsNonRoot: true # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000
sources:
- Kubernetes Pod Specification Good Defaults
- Configure a Security Context for a Pod or Container
- 10 Kubernetes Security Context settings you should understand
CodePudding user response:
Something at the container layer calls the setuid(2) system call with that numeric user ID. There's no particular requirement to "create" a user; if you are able to call setuid()
at all, you can call it with any numeric uid you want.
You can demonstrate this with plain Docker pretty easily. The docker run -u
option takes any numeric uid, and you can docker run -u 2000
and your container will (probably) still run. It's common enough to docker run -u $(id -u)
to run a container with the same numeric user ID as the host user even though that uid doesn't exist in the container's /etc/passwd
file.
At a Kubernetes layer this is a little less common. A container can't usefully access host files in a clustered environment (...on which host?) so there's no need to have a user ID matching the host's. If the image already sets up a non-root user ID, you should be able to just use it as-is without setting it at the Kubernetes layer.