Is the service affected when all masters are stopped?
OpenShift 4
- Infra Node 3
- Master Node 3
- Worker Node 3
※ Router pods are in the Infra Node.
The work request is as follows.
frontend(DC) -> api(DC)
- Internet -> Infra Node(Router) -> SDN -> Worker Node
www.frontend.test.com(443) - Route
- api REST Call (HttpUrlConnection or HttpClient 4.x, api:8080 or api.test1.svc.cluster.local:8080)
When all master nodes are stopped, the front-end always succeeds. However, the API call fails intermittently.
Slow Hang or UnknownHostException Message.
It is ok if there is at least one master node.
GET http://api.test1.svc.cluster.local:8080/ : java.net.UnknownHostException: api.test1.svc.cluster.local
or
GET http://api:8080/ : java.net.UnknownHostException: api
When slow Message.
"http-nio-8080-exec-2" #33 daemon prio=5 os_prio=0 cpu=6.89ms elapsed=452.97s tid=0x00007fdb88f15800 nid=0xb3 runnable [0x00007fdb34fc2000]
java.lang.Thread.State: RUNNABLE
at java.net.Inet6AddressImpl.lookupAllHostAddr([email protected]/Native Method)
at java.net.InetAddress$PlatformNameService.lookupAllHostAddr([email protected]/InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService([email protected]/InetAddress.java:1519)
at java.net.InetAddress$NameServiceAddresses.get([email protected]/InetAddress.java:848)
- locked <0x0000000782ff5be0> (a java.net.InetAddress$NameServiceAddresses)
at java.net.InetAddress.getAllByName0([email protected]/InetAddress.java:1509)
at java.net.InetAddress.getAllByName([email protected]/InetAddress.java:1368)
at java.net.InetAddress.getAllByName([email protected]/InetAddress.java:1302)
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
at
thank you.
CodePudding user response:
Yes, it is affected. Here's what's happening when you stop all the master nodes.
- The incoming traffic to the DC egress is being forwarded to the
Ingress
component of your Kubernetes/Openshift cluster. (Yourfront-end
). - This is succeeding because the name resolution of
Ingress
is responsibility of your infrastructure, sinceIngress
interfacing is external toOpenshift
. - Once the traffic reaches the
Ingress
(front-end
successfully reached), it needs to be forwarded now, to thebackend
service depending on the path in the request. - This cannot be done, since
Ingress
objects, by design, dynamically resolveService
DNS names into IP addresses in order to reach them inside the cluster. This is done, so that when services go down and come up and the IP address changes,Ingress
doesn't need to be reconfigured since the DNS name stays consistent. - Here, the resolution fails, because your DNS system (probably
core-dns
) is supposed to be running on themaster
nodes, which it isn't, and this leads tounresolved name
behavior. - Sometimes, it is possible that the
Ingress
has a local resolver cache entry which is valid and the request makes it to theService
and it gets a response. But this is highly unstable, since it is possible that this cache has been set with a auto-clean timeout and those entries are automatically purged after a while.