I have this weird situation where there are 2 services, each with its own ingress spec, but very similar. For one the ingress controller returns the expected certificate while for the other - the fake one. I am banging my head on it since morning to no avail.
Let us refer to the 2 services as bad.xyz.com
and good.xyz.com
respectively.
bad_host=bad.xyz.com
bad=https://$bad_host/dev/master/deuremittanceservice
good=https://good.xyz.com/dev/master/helloworldservice
The good one works
~$ curl $good
Hey Hello!
ASPNET: 6.0.9
BUILD_NUMBER: 1.0.0.34~$
The bad one does not work
~$ curl $bad
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
~$
How do I know it is the fake certificate that is being returned? Observe:
~$ echo | openssl s_client -showcerts -servername $bad_host -connect $bad_host:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep Subject:
Subject: O = Acme Co, CN = Kubernetes Ingress Controller Fake Certificate
~$
Ingress specs are very similar
~$ k get ing deuremittanceservice-master-ingress helloworldservice-master-ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
deuremittanceservice-master-ingress nginx-internal bad.xyz.com 10.16.241.242 80, 443 144d
helloworldservice-master-ingress nginx-internal good.xyz.com 10.16.241.242 80, 443 18d
~$ diff -U0 <(k get ing deuremittanceservice-master-ingress -o yaml | k neat) <(k get ing helloworldservice-master-ingress -o yaml | k neat)
--- /dev/fd/63 2023-01-29 11:39:26.368667000 -0500
/dev/fd/62 2023-01-29 11:39:26.368667000 -0500
@@ -7 7 @@
- name: deuremittanceservice-master-ingress
name: helloworldservice-master-ingress
@@ -12 12 @@
- - host: bad.xyz.com
- host: good.xyz.com
@@ -17 17 @@
- name: deuremittanceservice-master
name: helloworldservice-master
@@ -20 20 @@
- path: /dev/master/deuremittanceservice(/|$)(.*)
path: /dev/master/helloworldservice(/|$)(.*)
@@ -24,2 24,2 @@
- - bad.xyz.com
- secretName: deuremittanceservice-master-tls-secret
- good.xyz.com
secretName: helloworldservice-master-tls-secret
~$
The secrets they refer to are different k8s objects, but they contain exactly the same certificate and private key:
~$ k get secret deuremittanceservice-master-tls-secret -o jsonpath='{.data.tls\.crt}' | wc -c
7284
~$ k get secret deuremittanceservice-master-tls-secret -o jsonpath='{.data.tls\.key}' | wc -c
2236
~$ diff -U0 <(k get secret deuremittanceservice-master-tls-secret -o yaml | k neat) <(k get secret helloworldservice-master-tls-secret -o yaml | k neat)
--- /dev/fd/63 2023-01-29 11:40:51.615178000 -0500
/dev/fd/62 2023-01-29 11:40:51.615178000 -0500
@@ -7 7 @@
- name: deuremittanceservice-master-tls-secret
name: helloworldservice-master-tls-secret
~$
Service specs are very similar
~$ diff -U0 <(k get svc deuremittanceservice-master -o yaml | k neat) <(k get svc helloworldservice-master -o yaml | k neat)
--- /dev/fd/63 2023-01-29 11:41:23.280323000 -0500
/dev/fd/62 2023-01-29 11:41:23.280323000 -0500
@@ -4 4 @@
- name: deuremittanceservice-master
name: helloworldservice-master
@@ -7 7 @@
- clusterIP: 10.0.46.117
clusterIP: 10.0.236.30
@@ -9 9 @@
- - 10.0.46.117
- 10.0.236.30
@@ -17 17 @@
- app: deuremittanceservice-master
app: helloworldservice-master
~$
I checked the ingress controller logs at the debug level, but could not find anything useful there.
How do we troubleshoot something like this?
EDIT 1
Ingress controller
We use nginx, here is the deployment YAML:
~$ k -n nginx-internal-ingress get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
external-nginx-ingress-ingress-nginx-controller 2/2 2 2 151d
~$ k -n nginx-internal-ingress get deployments.apps external-nginx-ingress-ingress-nginx-controller -o yaml | k neat
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "20"
meta.helm.sh/release-name: external-nginx-ingress
meta.helm.sh/release-namespace: nginx-internal-ingress
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: external-nginx-ingress
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.1.2
helm.sh/chart: ingress-nginx-4.0.18
name: external-nginx-ingress-ingress-nginx-controller
namespace: nginx-internal-ingress
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: external-nginx-ingress
app.kubernetes.io/name: ingress-nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2023-01-27T13:10:54-05:00"
prometheus.io/path: /mymetrics
prometheus.io/port: "8000"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: external-nginx-ingress
app.kubernetes.io/name: ingress-nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: agentpool
operator: In
values:
- toolsnp1
containers:
- args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/external-nginx-ingress-ingress-nginx-controller-internal
- --election-id=ingress-controller-leader
- --controller-class=k8s.io/nginx-internal
- --ingress-class=nginx-internal
- --configmap=$(POD_NAMESPACE)/external-nginx-ingress-ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: LD_PRELOAD
value: /usr/local/lib/libmimalloc.so
image: mycr.azurecr.io/ingress-nginx/controller:v1.0.4
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
- containerPort: 10254
name: metrics
protocol: TCP
- containerPort: 8443
name: webhook
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 90Mi
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 101
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/certificates/
name: webhook-cert
readOnly: true
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
serviceAccount: external-nginx-ingress-ingress-nginx
serviceAccountName: external-nginx-ingress-ingress-nginx
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: tools
volumes:
- name: webhook-cert
secret:
defaultMode: 420
secretName: external-nginx-ingress-ingress-nginx-admission
~$
Please, ignore the mixup of "internal" and "external" terms in the different names.
Ingress Logs
I am going to define an auxiliary function and two variables:
function run_with_logs() {
url=$1
now=`date -u ' %Y-%m-%dT%H:%M:%S.%2NZ'` ; curl $url ; echo -e "\n--- LOGS ---" ; k -n nginx-internal-ingress logs -l 'app.kubernetes.io/component=controller' -c controller --since-time="$now" --tail 100000
}
The function curls the given url and returns the ingress controller logs produced since the command ran.
The result:
~$ run_with_logs $good
Hey Hello!
ASPNET: 6.0.9
BUILD_NUMBER: 1.0.0.34
--- LOGS ---
10.16.240.237 - - [29/Jan/2023:15:07:10 0000] "GET /dev/master/helloworldservice HTTP/2.0" 200 48 "-" "curl/7.75.0" 68 0.008 [dev-dfpayroll-helloworldservice-master-80] [] 10.16.240.38:80 59 0.008 200 67f7423632cb16c1019e80d0e38827a8
~$ run_with_logs $bad
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
--- LOGS ---
~$
So at normal verbosity there is not much information. Let me bump the error log level:
~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o yaml | yq '.data.error-log-level="info"' -M | k apply -f-
Warning: resource configmaps/external-nginx-ingress-ingress-nginx-controller is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
configmap/external-nginx-ingress-ingress-nginx-controller configured
~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o 'jsonpath={.data.error-log-level}{"\n"}'
info
~$
Now checking the logs.
Good
~$ run_with_logs $good
Hey Hello!
ASPNET: 6.0.9
BUILD_NUMBER: 1.0.0.34
--- LOGS ---
2023/01/29 16:48:16 [info] 450#450: *81577 client closed connection while SSL handshaking, client: 10.16.240.153, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 448#448: *81578 client closed connection while waiting for request, client: 10.16.240.153, server: 0.0.0.0:80
2023/01/29 16:48:16 [info] 447#447: *81214 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
10.16.240.150 - - [29/Jan/2023:16:48:16 0000] "GET /dev/master/helloworldservice HTTP/2.0" 200 48 "-" "curl/7.81.0" 68 0.007 [dev-dfpayroll-helloworldservice-master-80] [] 10.16.240.38:80 59 0.004 200 45c741b465f0e66f39f4b6719db17cba
2023/01/29 16:48:16 [info] 450#450: *81218 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80
2023/01/29 16:48:16 [info] 447#447: *81219 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 448#448: *81220 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 447#447: *81221 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:16 [info] 448#448: *81222 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80
~$
Bad
~$ run_with_logs $bad
curl: (60) SSL certificate problem: self-signed certificate
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
--- LOGS ---
2023/01/29 16:48:51 [info] 447#447: *81865 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80
2023/01/29 16:48:51 [info] 448#448: *81866 client closed connection while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
2023/01/29 16:48:51 [info] 450#450: *81867 client closed connection while waiting for request, client: 10.16.240.150, server: 0.0.0.0:80
2023/01/29 16:48:51 [info] 448#448: *81868 SSL_do_handshake() failed (SSL: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca:SSL alert number 48) while SSL handshaking, client: 10.16.240.150, server: 0.0.0.0:443
~$
I can try raising it to debug:
~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o yaml | yq '.data.error-log-level="debug"' -M | k apply -f-
configmap/external-nginx-ingress-ingress-nginx-controller configured
~$ k -n nginx-internal-ingress get cm external-nginx-ingress-ingress-nginx-controller -o 'jsonpath={.data.error-log-level}{"\n"}'
debug
~$
The result:
~$ run_with_logs $good > /c/Temp/good.log
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 48 0 48 0 0 313 0 --:--:-- --:--:-- --:--:-- 315
~$ run_with_logs $bad > /c/Temp/bad.log
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
~$
The logs can be found here:
- good.log - https://gist.githubusercontent.com/MarkKharitonov/63a1f7da0cb0e29ecfb109dc2eab988f/raw/c8114914daa4123ec391df08425f6c36946be77e/good.log
- bad.log - https://gist.githubusercontent.com/MarkKharitonov/63a1f7da0cb0e29ecfb109dc2eab988f/raw/0bf84b77acb00e36138e34544430cc310e23389c/bad.log
EDIT 2
mark@L-R910LPKW:~$ curl -k $bad
mark@L-R910LPKW:~$ curl -kv $bad
* Trying 10.16.241.242:443...
* Connected to bad.xyz.com (10.16.241.242) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
* start date: Jan 29 15:34:09 2023 GMT
* expire date: Jan 29 15:34:09 2024 GMT
* issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
* SSL certificate verify result: self-signed certificate (18), continuing anyway.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x55d9ef165550)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET /dev/master/deuremittanceservice HTTP/2
> Host: bad.xyz.com
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 404
< date: Sun, 29 Jan 2023 16:53:27 GMT
< content-length: 0
< request-context: appId=cid-v1:6ef5baff-1666-4d2a-801d-c99a97e9be30
< strict-transport-security: max-age=15724800; includeSubDomains
<
* Connection #0 to host bad.xyz.com left intact
mark@L-R910LPKW:~$
EDIT 3
~$ k -n nginx-internal-ingress logs -l 'app.kubernetes.io/component=controller' -c controller --tail 100000 | grep deuremittanceservice-master-ingress
I0129 17:52:22.050441 7 store.go:371] "Found valid IngressClass" ingress="qa-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:22.050590 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"qa-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"c4ce9dc9-049c-4116-9b4e-f2b83b163785", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443506", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:22.057749 7 store.go:371] "Found valid IngressClass" ingress="auto-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:22.058466 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"auto-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"74d82adc-2885-4ca8-bddb-6302c43851b7", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443480", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:22.062629 7 store.go:371] "Found valid IngressClass" ingress="dev-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:22.063874 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"09753eea-2ac4-41d1-9f4b-3da025442f87", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184442836", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:01.204055 11 store.go:371] "Found valid IngressClass" ingress="auto-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:01.204158 11 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"auto-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"74d82adc-2885-4ca8-bddb-6302c43851b7", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443480", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:01.206366 11 store.go:371] "Found valid IngressClass" ingress="dev-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:01.206506 11 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"09753eea-2ac4-41d1-9f4b-3da025442f87", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184442836", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0129 17:52:01.212198 11 store.go:371] "Found valid IngressClass" ingress="qa-dfpayroll/deuremittanceservice-master-ingress" ingress
I0129 17:52:01.212328 11 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"qa-dfpayroll", Name:"deuremittanceservice-master-ingress", UID:"c4ce9dc9-049c-4116-9b4e-f2b83b163785", APIVersion:"networking.k8s.io/v1", ResourceVersion:"184443506", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
~$
CodePudding user response:
I found the root cause. There is no problem in the namespace I was checking - dev-dfpayroll
. BUT, the particular feature team owns other namespaces and in one of the namespaces they created another deployment using the same FQDN (different ingress path, so technically they were fine), but they botched the certificate in the ingress.
All the ingresses using the same FQDN must specify the same server certificate, if not - bad things happen and this is exactly this case.
There are two ways to address this for the future:
- Configure a default certificate and let teams use it instead.
- Configure a policy to ensure all the ingresses using the same FQDN provide exactly the same server certificate. Or none at all.