Home > other >  OpenShift/K8s issue with project pods not joining same grid, but rather create multiple isolated gri
OpenShift/K8s issue with project pods not joining same grid, but rather create multiple isolated gri

Time:02-17

I have an issue when OpenShift project deployed with autoscaler configuration like this:

  • Min Pods = 10
  • Max Pods = 15

I can see that deployer immediately creates 5 pods and TcpDiscoveryKubernetesIpFinder creates not one grid, but multiple grids with same igniteInstanceName.

This issue could be is solved by this workaround

I changed autoscaler configuration to start with ONE pod:

  • Min Pods = 1
  • Max Pods = 15

And then scale up to 10 pods (or replicas=10):

  • Min Pods = 10
  • Max Pods = 15

Looks like TcpDiscoveryKubernetesIpFinder is not locking when it reads data from Kubernetes service that maintains list of IP addresses of all project pods. So when multiple pods started simultaneously it cause multiple grids creation. But when there is ONE pod started and grid with this pod created - new autoscaled pods are joining this existing grid.

PS No issues with ports 47100 or 47500, comms and discovery is working.

CodePudding user response:

OP confirmed in the comment, that the problem is resolved:

Thank you, let me know when TcpDiscoveryKubernetesIpFinder early adoption fix will be available. For now I've switched my Openshift micro-service IgniteConfiguration#discoverySpi to TcpDiscoveryJdbcIpFinder - which solved this issue (as it has this kind of lock, transactionIsolation=READ_COMMITTED).

You can read more about TcpDiscoveryJdbcIpFinder - here.

CodePudding user response:

Thanks for the information, indeed this might happen if multiple nodes have been started simultaneously. I've filed IGNITE-16568 to keep track of it.

Meantime, there are multiple workarounds, one of them is - use different IP fInder, like you did by utilizing TcpDiscoveryJdbcIpFinder.

Another option that I suppose will work - configure readinessProbe and even set initialDelaySeconds if required. It's always recommended to have the probes configured, here is an example of their configuration in Apache Ignite:

readinessProbe:
    httpGet:
        path: /ignite?cmd=probe
        port: 8080
    initialDelaySeconds: 5
    failureThreshold: 3
    periodSeconds: 10
    timeoutSeconds: 10
livenessProbe:
    httpGet:
        path: /ignite?cmd=version
        port: 8080
    initialDelaySeconds: 5
    failureThreshold: 3
    periodSeconds: 10
    timeoutSeconds: 10
  • Related