Home > Enterprise >  nginx ingress controller returns fake certificate for one ingress, but the right certificate for ano
nginx ingress controller returns fake certificate for one ingress, but the right certificate for ano

Time:01-31

I have this weird situation where there are 2 services, each with its own ingress spec, but very similar. For one the ingress controller returns the expected certificate while for the other - the fake one. I am banging my head on it since morning to no avail.

Let us refer to the 2 services as bad.xyz.com and good.xyz.com respectively.

bad_host=bad.xyz.com
bad=https://$bad_host/dev/master/deuremittanceservice
good=https://good.xyz.com/dev/master/helloworldservice

The good one works

~$ curl $good
Hey Hello!

ASPNET: 6.0.9
BUILD_NUMBER: 1.0.0.34~$

The bad one does not work

~$ curl $bad
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
~$

How do I know it is the fake certificate that is being returned? Observe:

~$ echo | openssl s_client -showcerts -servername $bad_host -connect $bad_host:443  2>/dev/null | openssl x509 -inform pem -noout -text | grep Subject:
        Subject: O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
~$

Ingress specs are very similar

~$ k get ing deuremittanceservice-master-ingress helloworldservice-master-ingress
NAME                               CLASS            HOSTS          ADDRESS         PORTS     AGE
deuremittanceservice-master-ingress    nginx-internal   bad.xyz.com    10.16.241.242   80, 443   144d
helloworldservice-master-ingress   nginx-internal   good.xyz.com   10.16.241.242   80, 443   18d
~$ diff -U0 <(k get ing deuremittanceservice-master-ingress -o yaml | k neat) <(k get ing helloworldservice-master-ingress -o yaml | k neat)
--- /dev/fd/63  2023-01-29 11:39:26.368667000 -0500
    /dev/fd/62  2023-01-29 11:39:26.368667000 -0500
@@ -7  7 @@
-  name: deuremittanceservice-master-ingress
   name: helloworldservice-master-ingress
@@ -12  12 @@
-  - host: bad.xyz.com
   - host: good.xyz.com
@@ -17  17 @@
-            name: deuremittanceservice-master
             name: helloworldservice-master
@@ -20  20 @@
-        path: /dev/master/deuremittanceservice(/|$)(.*)
         path: /dev/master/helloworldservice(/|$)(.*)
@@ -24,2  24,2 @@
-    - bad.xyz.com
-    secretName: deuremittanceservice-master-tls-secret
     - good.xyz.com
     secretName: helloworldservice-master-tls-secret
~$

The secrets they refer to are different k8s objects, but they contain exactly the same certificate and private key:

~$ k get secret deuremittanceservice-master-tls-secret -o jsonpath='{.data.tls\.crt}' | wc -c
7284
~$ k get secret deuremittanceservice-master-tls-secret -o jsonpath='{.data.tls\.key}' | wc -c
2236
~$ diff -U0 <(k get secret deuremittanceservice-master-tls-secret -o yaml | k neat) <(k get secret helloworldservice-master-tls-secret -o yaml | k neat)
--- /dev/fd/63  2023-01-29 11:40:51.615178000 -0500
    /dev/fd/62  2023-01-29 11:40:51.615178000 -0500
@@ -7  7 @@
-  name: deuremittanceservice-master-tls-secret
   name: helloworldservice-master-tls-secret
~$

Service specs are very similar

~$ diff -U0 <(k get svc deuremittanceservice-master -o yaml | k neat) <(k get svc helloworldservice-master -o yaml | k neat)
--- /dev/fd/63  2023-01-29 11:41:23.280323000 -0500
    /dev/fd/62  2023-01-29 11:41:23.280323000 -0500
@@ -4  4 @@
-  name: deuremittanceservice-master
   name: helloworldservice-master
@@ -7  7 @@
-  clusterIP: 10.0.46.117
   clusterIP: 10.0.236.30
@@ -9  9 @@
-  - 10.0.46.117
   - 10.0.236.30
@@ -17  17 @@
-    app: deuremittanceservice-master
     app: helloworldservice-master
~$

I checked the ingress controller logs at the debug level, but could not find anything useful there.

How do we troubleshoot something like this?

EDIT 1

Ingress controller

We use nginx, here is the deployment YAML:

~$ k -n nginx-internal-ingress get deployments.apps
NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
external-nginx-ingress-ingress-nginx-controller   2/2     2            2           151d
~$ k -n nginx-internal-ingress get deployments.apps external-nginx-ingress-ingress-nginx-controller -o yaml | k neat
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "20"
    meta.helm.sh/release-name: external-nginx-ingress
    meta.helm.sh/release-namespace: nginx-internal-ingress
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: external-nginx-ingress
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: external-nginx-ingress-ingress-nginx-controller
  namespace: nginx-internal-ingress
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: external-nginx-ingress
      app.kubernetes.io/name: ingress-nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2023-01-27T13:10:54-05:00"
        prometheus.io/path: /mymetrics
        prometheus.io/port: "8000"
        prometheus.io/scheme: http
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: external-nginx-ingress
        app.kubernetes.io/name: ingress-nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: agentpool
                operator: In
                values:
                - toolsnp1
      containers:
      - args:
        - /nginx-ingress-controller
        - --publish-service=$(POD_NAMESPACE)/external-nginx-ingress-ingress-nginx-controller-internal
        - --election-id=ingress-controller-leader
        - --controller-class=k8s.io/nginx-internal
        - --ingress-class=nginx-internal
        - --configmap=$(POD_NAMESPACE)/external-nginx-ingress-ingress-nginx-controller
        - --validating-webhook=:8443
        - --validating-webhook-certificate=/usr/local/certificates/cert
        - --validating-webhook-key=/usr/local/certificates/key
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: LD_PRELOAD
          value: /usr/local/lib/libmimalloc.so
        image: mycr.azurecr.io/ingress-nginx/controller:v1.0.4
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /wait-shutdown
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        - containerPort: 10254
          name: metrics
          protocol: TCP
        - containerPort: 8443
          name: webhook
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 100m
            memory: 90Mi
        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          runAsUser: 101
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/local/certificates/
          name: webhook-cert
          readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: external-nginx-ingress-ingress-nginx
      serviceAccountName: external-nginx-ingress-ingress-nginx
      terminationGracePeriodSeconds: 300
      tolerations:
      - effect: NoSchedule
        key: dedicated
        operator: Equal
        value: tools
      volumes:
      - name: webhook-cert
        secret:
          defaultMode: 420
          secretName: external-nginx-ingress-ingress-nginx-admission
~$

Please, ignore the mixup of "internal" and "external" terms in the different names.

Ingress Logs

I am going to define an auxiliary function and two variables:

function run_with_logs() {
    url=$1
    now=`date -u ' %Y-%m-%dT%H:%M:%S.%2NZ'` ; curl $url ; echo -e "\n--- LOGS ---" ; k -n nginx-internal-ingress logs -l 'app.kubernetes.io/component=controller' -c controller --since-time="$now" --tail 100000
}

The function curls the given url and returns the ingress controller logs produced since the command ran.

The result:

~$ run_with_logs $good
Hey Hello!

ASPNET: 6.0.9
BUILD_NUMBER: 1.0.0.34
--- LOGS ---
10.16.240.237 - - [29/Jan/2023:15:07:10  0000] "GET /dev/master/helloworldservice HTTP/2.0" 200 48 "-" "curl/7.75.0" 68 0.008 [dev-dfpayroll-helloworldservice-master-80] [] 10.16.240.38:80 59 0.008 200 67f7423632cb16c1019e80d0e38827a8

~$ run_with_logs $bad
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

--- LOGS ---

~$

So at normal verbosity there is not much information. Let me bump the error log level:

~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o yaml | yq '.data.error-log-level="info"' -M | k apply -f-
Warning: resource configmaps/external-nginx-ingress-ingress-nginx-controller is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
configmap/external-nginx-ingress-ingress-nginx-controller configured
~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o 'jsonpath={.data.error-log-level}{"\n"}'
info
~$

Now checking the logs.

Good

~$ run_with_logs $good
Hey Hello!

ASPNET: 6.0.9
BUILD_NUMBER: 1.0.0.34
--- LOGS ---
2023/01/29 16:48:16 [info] 450#450: *81577 client closed connection while SSL handshaking, client: 10.16.240.153, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 448#448: *81578 client closed connection while waiting for request, client: 10.16.240.153, server: 0.0.0.0:80
2023/01/29 16:48:16 [info] 447#447: *81214 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
10.16.240.150 - - [29/Jan/2023:16:48:16  0000] "GET /dev/master/helloworldservice HTTP/2.0" 200 48 "-" "curl/7.81.0" 68 0.007 [dev-dfpayroll-helloworldservice-master-80] [] 10.16.240.38:80 59 0.004 200 45c741b465f0e66f39f4b6719db17cba
2023/01/29 16:48:16 [info] 450#450: *81218 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80
2023/01/29 16:48:16 [info] 447#447: *81219 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 448#448: *81220 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 447#447: *81221 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 448#448: *81222 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80

~$

Bad

~$ run_with_logs $bad
curl: (60) SSL certificate problem: self-signed certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

--- LOGS ---
2023/01/29 16:48:51 [info] 447#447: *81865 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80
2023/01/29 16:48:51 [info] 448#448: *81866 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:51 [info] 450#450: *81867 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80
2023/01/29 16:48:51 [info] 448#448: *81868 SSL_do_handshake() failed (SSL: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca:SSL alert number 48) while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443

~$

I can try raising it to debug:

~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o yaml | yq '.data.error-log-level="debug"' -M | k apply -f-
configmap/external-nginx-ingress-ingress-nginx-controller configured
~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o 'jsonpath={.data.error-log-level}{"\n"}'
debug
~$

The result:

~$ run_with_logs $good > /c/Temp/good.log
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    48    0    48    0     0    313      0 --:--:-- --:--:-- --:--:--   315

~$ run_with_logs $bad > /c/Temp/bad.log
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

~$

The logs can be found here:

EDIT 2

mark@L-R910LPKW:~$ curl -k $bad
mark@L-R910LPKW:~$ curl -kv $bad
*   Trying 10.16.241.242:443...
* Connected to bad.xyz.com (10.16.241.242) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
*  start date: Jan 29 15:34:09 2023 GMT
*  expire date: Jan 29 15:34:09 2024 GMT
*  issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
*  SSL certificate verify result: self-signed certificate (18), continuing anyway.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x55d9ef165550)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET /dev/master/deuremittanceservice HTTP/2
> Host: bad.xyz.com
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 404
< date: Sun, 29 Jan 2023 16:53:27 GMT
< content-length: 0
< request-context: appId=cid-v1:6ef5baff-1666-4d2a-801d-c99a97e9be30
< strict-transport-security: max-age=15724800; includeSubDomains
<
* Connection #0 to host bad.xyz.com left intact
mark@L-R910LPKW:~$

EDIT 3

~$ k -n nginx-internal-ingress logs -l 'app.kubernetes.io/component=controller' -c controller --tail 100000 | grep deuremittanceservice-master-ingress
I0129 17:52:22.050441       7 store.go:371] "Found valid IngressClass" ingress="qa-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:22.050590       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"qa-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"c4ce9dc9-049c-4116-9b4e-f2b83b163785", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443506", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:22.057749       7 store.go:371] "Found valid IngressClass" ingress="auto-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:22.058466       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"auto-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"74d82adc-2885-4ca8-bddb-6302c43851b7", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443480", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:22.062629       7 store.go:371] "Found valid IngressClass" ingress="dev-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:22.063874       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"09753eea-2ac4-41d1-9f4b-3da025442f87", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184442836", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:01.204055      11 store.go:371] "Found valid IngressClass" ingress="auto-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:01.204158      11 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"auto-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"74d82adc-2885-4ca8-bddb-6302c43851b7", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443480", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:01.206366      11 store.go:371] "Found valid IngressClass" ingress="dev-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:01.206506      11 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"09753eea-2ac4-41d1-9f4b-3da025442f87", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184442836", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:01.212198      11 store.go:371] "Found valid IngressClass" ingress="qa-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:01.212328      11 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"qa-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"c4ce9dc9-049c-4116-9b4e-f2b83b163785", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443506", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync

~$

CodePudding user response:

I found the root cause. There is no problem in the namespace I was checking - dev-dfpayroll. BUT, the particular feature team owns other namespaces and in one of the namespaces they created another deployment using the same FQDN (different ingress path, so technically they were fine), but they botched the certificate in the ingress.

All the ingresses using the same FQDN must specify the same server certificate, if not - bad things happen and this is exactly this case.

There are two ways to address this for the future:

  • Configure a default certificate and let teams use it instead.
  • Configure a policy to ensure all the ingresses using the same FQDN provide exactly the same server certificate. Or none at all.
  • Related