I'm using the SeleniumGrid in the most recent version 4.1.2
in a Kubernetes cluster.
In many cases (I would say in about half) when I execute a test through the grid, the node fails to kill the processes and does not go back to being idle. The container then keeps using one full CPU all the time until I kill it manually.
The log in the container is the following:
10:51:34.781 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:51:35.680 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
Starting ChromeDriver 98.0.4758.102 (273bf7ac8c909cde36982d27f66f3c70846a3718-refs/branch-heads/4758@{#1151}) on port 39592
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
[1C6h4r6o1m2e9D1r2i3v.e9r8 7w]a[sS EsVtEaRrEt]e:d bsiuncdc(e)s sffauillleyd.:
Cannot assign requested address (99)
11:08:24.970 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "99100300a4e6b4fe2afe5891b50def09","eventTime": 1646129304968456597,"eventName": "No slot matched the requested capabilities. ","attributes"
11:08:44.672 INFO [OsProcess.destroy] - Unable to drain process streams. Ignoring but the exception being swallowed follows.
org.apache.commons.exec.ExecuteException: The stop timeout of 2000 ms was exceeded (Exit value: -559038737)
at org.apache.commons.exec.PumpStreamHandler.stopThread(PumpStreamHandler.java:295)
at org.apache.commons.exec.PumpStreamHandler.stop(PumpStreamHandler.java:180)
at org.openqa.selenium.os.OsProcess.destroy(OsProcess.java:135)
at org.openqa.selenium.os.CommandLine.destroy(CommandLine.java:152)
at org.openqa.selenium.remote.service.DriverService.stop(DriverService.java:281)
at org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:183)
at org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:65)
at org.openqa.selenium.grid.node.local.SessionSlot.apply(SessionSlot.java:143)
at org.openqa.selenium.grid.node.local.LocalNode.newSession(LocalNode.java:314)
at org.openqa.selenium.grid.node.NewNodeSession.execute(NewNodeSession.java:52)
at org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:192)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.grid.security.RequiresSecretFilter.lambda$apply$0(RequiresSecretFilter.java:64)
at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
11:08:44.673 ERROR [OsProcess.destroy] - Unable to kill process Process[pid=75, exitValue=143]
11:08:44.675 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "99100300a4e6b4fe2afe5891b50def09","eventTime": 1646129316638154262,"eventName": "exception","attributes": {"driver.url": "http:\u002f\u002f
Here's an excerpt from the Kubernetes manifest:
- name: selenium-node-chrome
image: selenium/node-chrome:latest
...
env:
- name: TZ
value: Europe/Berlin
- name: START_XVFB
value: "false"
- name: SE_NODE_OVERRIDE_MAX_SESSIONS
value: "true"
- name: SE_NODE_MAX_SESSIONS
value: "1"
envFrom:
- configMapRef:
name: selenium-event-bus-config
...
volumeMounts:
- name: dshm
mountPath: /dev/shm
...
volumes:
- name: dshm
emptyDir:
medium: Memory
The selenium-event-bus-config
contains the following vars:
data:
SE_EVENT_BUS_HOST: selenium-hub
SE_EVENT_BUS_PUBLISH_PORT: "4442"
SE_EVENT_BUS_SUBSCRIBE_PORT: "4443"
Did I misconfigure anything? Has anyone any idea how I can fix this?
CodePudding user response:
The issue was fixed by removing the env variable START_XVFB=false
.
Grid seems to work reliably without it. This fixes it for me for now.
@mikołaj-głodziak suggested, that a K8s upgrade might fix this. I'm currently running a bare-metal 1.19.8 cluster. I will upgrade to a more recent version soon. I'll re-try disabling XVFB afterwards and report back here.