Where did Kubernetes Security context runAsUser 1000 come from?-CodePudding

I learnt that to run a container as rootless, you need to specify either the SecurityContext:runAsUser 1000 or specify the USER directive in the DOCKERFILE.

Question on this is that there is no UID 1000 on the Kubernetes/Docker host system itself.

I learnt before that Linux User Namespacing allows a user to have a different UID outside it's original NS.

Hence, how does UID 1000 exist under the hood? Did the original root (UID 0) create a new user namespace which is represented by UID 1000 in the container?

What happens if we specify UID 2000 instead?

CodePudding user response：

Hope this answer helps you

I learnt that to run a container as rootless, you need to specify either the SecurityContext:runAsUser 1000 or specify the USER directive in the DOCKERFILE

You are correct except in runAsUser: 1000. you can specify any UID, not only 1000. Remember any UID you want to use (runAsUser: UID), that UID should already be there!

Often, base images will already have a user created and available but leave it up to the development or deployment teams to leverage it. For example, the official Node.js image comes with a user named node at UID 1000 that you can run as, but they do not explicitly set the current user to it in their Dockerfile. We will either need to configure it at runtime with a runAsUser setting or change the current user in the image using a derivative Dockerfile.

runAsUser: 1001          # hardcode user to non-root if not set in Dockerfile
runAsGroup: 1001         # hardcode group to non-root if not set in Dockerfile
runAsNonRoot: true       # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000

Remmeber that runAsUser and runAsGroup ensures container processes do not run as the root user but don’t rely on the runAsUser or runAsGroup settings to guarantee this. Be sure to also set runAsNonRoot: true.

Here is full example of securityContext:

# generic pod spec that's usable inside a deployment or other higher level k8s spec

apiVersion: v1
kind: Pod
metadata:
  name: mypod

spec:

  containers:

      # basic container details
    - name: my-container-name
      # never use reusable tags like latest or stable
      image: my-image:tag
      # hardcode the listening port if Dockerfile isn't set with EXPOSE
      ports:
        - containerPort: 8080
          protocol: TCP

      readinessProbe:        # I always recommend using these, even if your app has no listening ports (this affects any rolling update)
        httpGet:             # Lots of timeout values with defaults, be sure they are ideal for your workload
          path: /ready
          port: 8080
      livenessProbe:         # only needed if your app tends to go unresponsive or you don't have a readinessProbe, but this is up for debate
        httpGet:             # Lots of timeout values with defaults, be sure they are ideal for your workload
          path: /alive
          port: 8080

      resources:             # Because if limits = requests then QoS is set to "Guaranteed"
        limits:
          memory: "500Mi"    # If container uses over 500MB it is killed (OOM)
          #cpu: "2"          # Not normally needed, unless you need to protect other workloads or QoS must be "Guaranteed"
        requests:
          memory: "500Mi"    # Scheduler finds a node where 500MB is available
          cpu: "1"           # Scheduler finds a node where 1 vCPU is available

      # per-container security context
      # lock down privileges inside the container
      securityContext:
        allowPrivilegeEscalation: false # prevent sudo, etc.
        privileged: false               # prevent acting like host root
  
  terminationGracePeriodSeconds: 600 # default is 30, but you may need more time to gracefully shutdown (HTTP long polling, user uploads, etc)

  # per-pod security context
  # enable seccomp and force non-root user
  securityContext:

    seccompProfile:
      type: RuntimeDefault   # enable seccomp and the runtimes default profile

    runAsUser: 1001          # hardcode user to non-root if not set in Dockerfile
    runAsGroup: 1001         # hardcode group to non-root if not set in Dockerfile
    runAsNonRoot: true       # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000

sources:

CodePudding user response：

Something at the container layer calls the setuid(2) system call with that numeric user ID. There's no particular requirement to "create" a user; if you are able to call setuid() at all, you can call it with any numeric uid you want.

You can demonstrate this with plain Docker pretty easily. The docker run -u option takes any numeric uid, and you can docker run -u 2000 and your container will (probably) still run. It's common enough to docker run -u $(id -u) to run a container with the same numeric user ID as the host user even though that uid doesn't exist in the container's /etc/passwd file.

At a Kubernetes layer this is a little less common. A container can't usefully access host files in a clustered environment (...on which host?) so there's no need to have a user ID matching the host's. If the image already sets up a non-root user ID, you should be able to just use it as-is without setting it at the Kubernetes layer.