GKE Django MySQL is not accessible during rolling update-CodePudding

I have Django application deployed in GKE. (Done with this tutorial)

My configuration file: myapp.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp-app
        image: gcr.io/myproject/myapp
        imagePullPolicy: IfNotPresent

    ---------

      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql",
                  "-instances=myproject:europe-north1:myapp=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]


apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: myapp

settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': os.environ['DATABASE_NAME'],
        'USER': os.environ['DATABASE_USER'],
        'PASSWORD': os.environ['DATABASE_PASSWORD'],
        'HOST': '127.0.0.1',
        'PORT': os.getenv('DATABASE_PORT', '3306'),
    }

Now, when I do rolling update or via

kubectl rollout restart deployment myapp

kubectl apply -f myapp.yaml

kubectl get pods is in the following state:

NAME                         READY   STATUS        RESTARTS   AGE
myapp-8477898cff-5wztr   2/2     Terminating   0          88s
myapp-8477898cff-ndt5b   2/2     Terminating   0          85s
myapp-8477898cff-qxzsh   2/2     Terminating   0          82s
myapp-97d6ccfc4-4qmpj    2/2     Running       0          6s
myapp-97d6ccfc4-vr6mb    2/2     Running       0          4s
myapp-97d6ccfc4-xw294    2/2     Running       0          7s

I am getting the following error for some amount of time during rolling out:

OperationalError at /
(2003, "Can't connect to MySQL server on '127.0.0.1' (111)")

Please advise how can I ajust settings to have rollout without a downtime/this error

UPD

I have figured out by looking into logs that this happens because cloudsql-proxy brought down first while application container is still alive.

Log of app:

Found 3 pods, using pod/myapp-f59c686b5-6t7c4
[2022-02-27 17:39:55  0000] [7] [INFO] Starting gunicorn 20.0.4
[2022-02-27 17:39:55  0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[2022-02-27 17:39:55  0000] [7] [INFO] Using worker: sync
[2022-02-27 17:39:55  0000] [10] [INFO] Booting worker with pid: 10
Internal Server Error: /api/health/    # here cloudsql-proxy died
Internal Server Error: /api/health/
Internal Server Error: /api/health/

.... here more messages of Internal Server Error ...

rpc error: code = NotFound desc = an error occurred when try to find container "ec7658770c772eff6efb544a502fcd1841d7401add6efb2b53bf264b8eca1bb6": not founde

Log of cloudsql-proxy

2022/02/28 08:17:58 New connection for "myapp:europe-north1:myapp"
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:59 Received TERM signal. Waiting up to 0s before terminating.

So I guess solution should be to bring the order in shutdown - somehow shutdown the application before shutting down cloudsql-proxy when pod is updated.

CodePudding user response：

It looks like the sidecar for the proxy is terminating, and not letting you clean up before the application does.

Consider using the -term-timeout flag to give yourself some time: https://github.com/GoogleCloudPlatform/cloudsql-proxy#-term_timeout30s