Is the service affected when all masters are stopped?-CodePudding

Is the service affected when all masters are stopped?

OpenShift 4

Infra Node 3
Master Node 3
Worker Node 3

※ Router pods are in the Infra Node.

The work request is as follows.

frontend(DC) -> api(DC)
- Internet -> Infra Node(Router) -> SDN -> Worker Node
www.frontend.test.com(443) - Route
- api REST Call (HttpUrlConnection or HttpClient 4.x, api:8080 or api.test1.svc.cluster.local:8080)

When all master nodes are stopped, the front-end always succeeds. However, the API call fails intermittently.

Slow Hang or UnknownHostException Message.

It is ok if there is at least one master node.

GET http://api.test1.svc.cluster.local:8080/ : java.net.UnknownHostException: api.test1.svc.cluster.local
or
GET http://api:8080/ : java.net.UnknownHostException: api

When slow Message.

"http-nio-8080-exec-2" #33 daemon prio=5 os_prio=0 cpu=6.89ms elapsed=452.97s tid=0x00007fdb88f15800 nid=0xb3 runnable  [0x00007fdb34fc2000]
   java.lang.Thread.State: RUNNABLE
    at java.net.Inet6AddressImpl.lookupAllHostAddr([email protected]/Native Method)
    at java.net.InetAddress$PlatformNameService.lookupAllHostAddr([email protected]/InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService([email protected]/InetAddress.java:1519)
    at java.net.InetAddress$NameServiceAddresses.get([email protected]/InetAddress.java:848)
    - locked <0x0000000782ff5be0> (a java.net.InetAddress$NameServiceAddresses)
    at java.net.InetAddress.getAllByName0([email protected]/InetAddress.java:1509)
    at java.net.InetAddress.getAllByName([email protected]/InetAddress.java:1368)
    at java.net.InetAddress.getAllByName([email protected]/InetAddress.java:1302)
    at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
    at

thank you.

CodePudding user response：

Yes, it is affected. Here's what's happening when you stop all the master nodes.

The incoming traffic to the DC egress is being forwarded to the Ingress component of your Kubernetes/Openshift cluster. (Your front-end).
This is succeeding because the name resolution of Ingress is responsibility of your infrastructure, since Ingress interfacing is external to Openshift.
Once the traffic reaches the Ingress (front-end successfully reached), it needs to be forwarded now, to the backend service depending on the path in the request.
This cannot be done, since Ingress objects, by design, dynamically resolve Service DNS names into IP addresses in order to reach them inside the cluster. This is done, so that when services go down and come up and the IP address changes, Ingress doesn't need to be reconfigured since the DNS name stays consistent.
Here, the resolution fails, because your DNS system (probably core-dns) is supposed to be running on the master nodes, which it isn't, and this leads to unresolved name behavior.
Sometimes, it is possible that the Ingress has a local resolver cache entry which is valid and the request makes it to the Service and it gets a response. But this is highly unstable, since it is possible that this cache has been set with a auto-clean timeout and those entries are automatically purged after a while.