Home > Back-end >  Postgresql data on k8s cannot be made persistent
Postgresql data on k8s cannot be made persistent

Time:01-25

I am building postgresql on kubernetes, but I cannot persist postgres data. The host is a GCP instance with Ubuntu-20.4. The volumes for disk are using GCP volumes, which are mounted after attaching to the instance.

Filesystem      1K-blocks    Used Available Use% Mounted on
/dev/root         9982728 4252592   5713752  43% /
devtmpfs          4038244       0   4038244   0% /dev
tmpfs             4059524       0   4059524   0% /dev/shm
tmpfs              811908    1568    810340   1% /run
tmpfs                5120       0      5120   0% /run/lock
tmpfs             4059524       0   4059524   0% /sys/fs/cgroup
/dev/loop0          50304   50304         0 100% /snap/core18/2671
/dev/loop2         217856  217856         0 100% /snap/google-cloud-cli/98
/dev/loop3          94208   94208         0 100% /snap/lxd/24065
/dev/loop1          60544   60544         0 100% /snap/core20/1782
/dev/loop4          44032   44032         0 100% /snap/snapd/17885
/dev/nvme0n1p15     99801    6004     93797   7% /boot/efi
tmpfs              811904       0    811904   0% /run/user/1001
shm                 65536       0     65536   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/49ebcf7b449f4b13d52aab6f52c28f139c551f83070f6f21207dbf52315dc264/shm
shm                 65536       0     65536   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/e46f0c6b19e5ccff9bb51fa3f7669a9a6a2e7cfccf54681e316a9cd58183dce4/shm
shm                 65536       0     65536   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/495c80e87521bfdda55827df64cdb84cddad149fb502ac7ee12f3607badd4649/shm
shm                 65536       0     65536   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/443e7b254d02c88873c59edc6d5b0a71e80da382ea81105e8b312ad4122d694a/shm
/dev/nvme0n3     10218772      12   9678088   1% /var/lib/postgresql ※ disk for postgres
/dev/nvme0n2      3021608      24   2847916   1% /var/lib/pgadmin ※ disk for pgadmin
shm                 65536    1052     64484   2% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/bd83982e91b6a3bce7853416d72878d5473174e884c15578c47a8d8952f4e718/shm

Also, the pod volume is allocated using persistent volume and persistent volume claim.

kind: PersistentVolume
apiVersion: v1
metadata:
  name: postgres-pv-volume  # Sets PV's name
  labels:
    app: postgres
spec:
  # storageClassName: local-storage
  capacity:
    storage: 10Gi # Sets PV Volume
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "/var/lib/postgresql"
---
kind: PersistentVolume
apiVersion: v1
metadata:
  name: pgadmin-pv-volume  # Sets PV's name
  labels:
    app: pgadmin
spec:
  # storageClassName: local-storage
  capacity:
    storage: 3Gi # Sets PV Volume
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "/var/lib/pgadmin"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgresql
  labels:
    app: postgresql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
        - image: docker.io/yamamuratkr/postgres
          name: postgresql
          env:
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: postgres
                  key: database_user
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres
                  key: database_password
            - name: PGDATA
              value: "/var/lib/postgresql/data/pgdata"
          ports:
            - containerPort: 5432
              name: postgresql
          volumeMounts:
            - name: postgredb
              mountPath: /var/lib/postgresql
      volumes:
        - name: postgredb
          persistentVolumeClaim:
            claimName: postgres-pv-claim

However, when the pod is deleted, the postgres data also disappears with it, and I can confirm that it is not persistent.

If you know the cause of this problem, please let me know. Thank you in advance.

None of the following worked

  • Using hostpath for pod volumes
  • Use default PGDATA

CodePudding user response:

The source of your problem is here:

volumeMounts:
  - name: postgredb
    mountPath: /var/lib/postgresql

The postgres image itself mounts a volume on /var/lib/postgresql/data. We can see that if we inspect the image:

$ docker image inspect docker.io/postgres:14 | jq '.[0].ContainerConfig.Volumes'
{
  "/var/lib/postgresql/data": {}
}

Your mount on /var/lib/postgresql is effectively a no-op. An ephemeral volume is created on /var/lib/postgresql/data each time the container starts, and since that's the default PGDATA location, your data is effectively discarded when the container exits.

I've put together an example in my local environment to demonstrate this behavior. I've made a few minor changes from your example that shouldn't operationally impact anything.

I've created the following Secret with the postgres credentials; by naming the keys like this we can use a single envFrom block instead of multiple env entries (see the Deployment for details):

apiVersion: v1
kind: Secret
metadata:
  name: postgres-env
type: Opaque
stringData:
  POSTGRES_USER: example
  POSTGRES_PASSWORD: secret

And in line with my comment I'm injecting a file into /docker-entrypoint-initdb.d from this ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
data:
  initdb.sql: |
    CREATE TABLE people (
      id SERIAL,
      name VARCHAR(40),
      favorite_color VARCHAR(10)
    );

    INSERT INTO people (name, favorite_color) VALUES ('bob', 'red');
    INSERT INTO people (name, favorite_color) VALUES ('alice', 'blue');
    INSERT INTO people (name, favorite_color) VALUES ('mallory', 'purple');

I'm using this PersistentVolumeClaim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pv-claim
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

And finally I'm tying it all together in this Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: postgres
  name: postgresql
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - envFrom:
        - secretRef:
            name: postgres-env
        image: docker.io/postgres:14
        name: postgresql
        ports:
        - containerPort: 5432
          name: postgresql
        volumeMounts:
        - mountPath: /docker-entrypoint-initdb.d
          name: postgres-config
        - mountPath: /var/lib/postgresql
          name: postgres-data
      volumes:
      - name: postgres-data
        persistentVolumeClaim:
          claimName: postgres-pv-claim
      - configMap:
          name: postgres-config
        name: postgres-config

The significant changes here are:

  1. I'm using a single envFrom block to set environment variables from the keys in the postgres-env secret.

  2. I'm using the upstream docker.io/postgres:14 image rather than building my own custom image.

  3. I'm injecting the contents of /docker-entrypoint-initdb.d from the postgres-config ConfigMap.

Note that this deployment is using the same mountPath as in your example`.


If I bring up this environment, I can see that the database initialization script was executed correctly. The people table exists and has the expected data:

$ kubtectl exec -it deploy/postgresql -- psql -U example -c 'select * from people'
id |  name   | favorite_color
---- --------- ----------------
1 | bob     | red
2 | alice   | blue
3 | mallory | purple
(3 rows)

Let's make a change to the database and see what happens when we restart the pod. First, we add a new row to the table and view the updated table:

$ kubectl exec -it deploy/postgresql -- psql -U example -c "insert into people (name, favorite_color) values ('carol', 'green')"
INSERT 0 1
$ kubectl exec -it deploy/postgresql -- psql -U example -c 'select * from people'
id |  name   | favorite_color
---- --------- ----------------
1 | bob     | red
2 | alice   | blue
3 | mallory | purple
4 | carol   | green
(4 rows)

Now we restart the database pod:

$ kubectl rollout restart deployment/postgresql

And finally check if our changes survived the restart:

$ kubectl exec -it deploy/postgresql -- psql -U example -c 'select * from people' id |  name   | favorite_color
---- --------- ----------------
1 | bob     | red
2 | alice   | blue
3 | mallory | purple
(3 rows)

As expected, they did not! Let's change the mountPath in the Deployment so that it looks like this instead:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: postgres
  name: postgresql
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - envFrom:
        - secretRef:
            name: postgres-env
        image: docker.io/postgres:14
        name: postgresql
        ports:
        - containerPort: 5432
          name: postgresql
        volumeMounts:
        - mountPath: /docker-entrypoint-initdb.d
          name: postgres-config
        - mountPath: /var/lib/postgresql/data
          name: postgres-data
      volumes:
      - name: postgres-data
        persistentVolumeClaim:
          claimName: postgres-pv-claim
      - configMap:
          name: postgres-config
        name: postgres-config

Using this Deployment, with no other changes, we can re-run the previous test and see that our data persists as desired:

$ kubectl exec -it deploy/postgresql -- psql -U example -c "insert into people (name, favorite_color) values ('carol', 'green')"
INSERT 0 1
$ k rollout restart deployment/postgresql
deployment.apps/postgresql restarted
$ k exec -it deploy/postgresql -- psql -U example -c 'select * from people' id |  name   | favorite_color
---- --------- ----------------
  1 | bob     | red
  2 | alice   | blue
  3 | mallory | purple
  4 | carol   | green
(4 rows)

An alternative solution would be to mount your volume in a completely different location and then set PGDATA appropriately. E.g.,

          ...
          env:
            - name: PGDATA
              value: /data
          ...
          volumeMounts:
            - name: postgres-data
              mountPath: /data
          ...
  • Related