Home > Software design >  Cant't access ClusterIP serivce from pod within same cluster even with svc.cluster.local
Cant't access ClusterIP serivce from pod within same cluster even with svc.cluster.local

Time:03-27

We are trying to inter communicate with multiple services without exposing them to the public.

I have a service like this:-

apiVersion: "v1"
kind: "Service"
metadata:
  name: "config"
  namespace: "noicenamespace"
  labels:
    app: "config"
spec:
  ports:
  - protocol: "TCP"
    port: 8888
    targetPort: 8888
  selector:
    app: "config"
  type: "LoadBalancer"
  loadBalancerIP: ""

Due to type LoadBalancer, the service is accesible on public network. But we only want this service to be visible to our internal services in our cluster.

So if I comment out the loadBalancerIP and set type as ClusterIP, My other pods can't access the service. I tried specifying the service name like this:-

http://config.noicenamespace.svc.cluster.local:8888

But I get timeout. We have created cluster from scratch on Google Kubernetes Engine

CodePudding user response:

I guess you missed the port number here 8888

so it should be called like this

http://config.noicenamespace.svc.cluster.local:8888

Debugging step:

  1. Go one of the pod's exec mode.
  2. wget -qO- http://config.noicenamespace.svc.cluster.local:8888 or curl check if you get any respsonse or not and then exit of pods exec mode.
  3. Check endpoints using kubectl get ep and check the ip address of the endpoint and try to curl that IP:PORT number if you get IP from ep then it means the pod'S are attached to service and everything is correct
  4. Check if the network policy are in place or not.

CodePudding user response:

This error "Error from server: error dialing backend: dial timeout" is related to the progressive introduction of Konnectivity network proxy in some clusters starting from GKE 1.19.

The Konnectivity network proxy (KNP) provides a TCP level proxy for master egress (kube-apiserver to cluster communication),

The Konnectivity service consists of two parts: the Konnectivity server in the control plane network and the Konnectivity agents in the nodes network. The Konnectivity agents initiate connections to the Konnectivity server and maintain the network connections. After enabling the Konnectivity service, all control plane to nodes traffic goes through these connections, due to this there must be a firewall rule that allows communication correctly through port(use the port number getting displayed in the error message with the IP of the end point), otherwise dial timeout errors may occur.

By using this filter on Cloud Logging you can found the error logs related to konnectivity agent timeout connection due to missing firewall rule:(Note the IP address and the Port number of the endpoint from the error use the details in the firewall rule)

resource.labels.cluster_name="cluster name"
"konnectivity-agent"

Add a firewall egress rule that allow you to connect to the port (use the port number getting displayed in the error message with the IP of the end point) , you could use the following command to add that rule. that should allow the konnectivity-agent to connect to the control plane.

gcloud compute firewall-rules create gke-node-to-konnectivity-service \
--allow=tcp:<port number> \
--direction=EGRESS \
--destination-ranges=<endpoint IP address > \
--target-tags=< node name> \
--priority=1000
  • Related