Home > Blockchain >  How to access WebElement from executeScript in RSelenium?
How to access WebElement from executeScript in RSelenium?

Time:10-25

I want to extract data from this website which has shadow-dom. I think I've managed to access the elements inside the shadow-dom using JavaScript, but I haven't figured out how to use the returned value from the JavaScript as WebElements so that I can process the data.

library(RSelenium)

rD <- rsDriver(browser="firefox", port=4547L, verbose=F)
remDr <- rD[["client"]]

remDr$navigate("https://www.transfermarkt.us")

## run script to enable dropdown list in the website. This creates a <ul> tag in the shadow-dom which lists all items in the dropdown list.
remDr$executeScript("return document.querySelector('tm-quick-select-bar').setAttribute('dropdown-visible', 'countries')")
Sys.sleep(5)

This is only the portion that contains the shadow-dom. I'm not sure if this is required, but this is where the dropdown lists is present

wrapper <- remDr$findElement(using="tag name", value="tm-quick-select-bar")

Below is the script to access the dropdown list

script <- 'return document.querySelector("#main > header > div.quick-select-wrapper > tm-quick-select-bar").shadowRoot.querySelector("div > tm-quick-select:nth-child(2) > div > div.selector-dropdown > ul");'

test <- remDr$executeScript('return document.querySelector("#main > header > div.quick-select-wrapper > tm-quick-select-bar").shadowRoot.querySelector("div > tm-quick-select:nth-child(2) > div > div.selector-dropdown > ul");', list(wrapper))

This returns the following list.

> test                                                                                    
$`element-6066-11e4-a52e-4f735466cecf`                                                    
[1] "4adac8f8-2c94-4e48-b7a3-521eb961ef8c"  

I have no idea how to extract the items from this. It doesn't seem like it's a WebElement. What is this list and what information does it contain? How can I extract it?

I tried this

lapply(test, function(x){
    x$getElementText()
    x[[1]]$getElementText()
})

But, it returns the errors:

Error in x$getElementText : $ operator is invalid for atomic vectors      

CodePudding user response:

Not sure if selenium can deal with shadow DOM, there is a plugin here that supposedly solves that for java. Nevertheless, you can extract innerHTML an manage it with rvest

library(RSelenium)

rD <- rsDriver(browser="chrome", port=4547L, verbose=F, chromever="106.0.5249.21")
remDr <- rD[["client"]]

remDr$navigate("https://www.transfermarkt.us")

## run script to enable dropdown list in the website. This creates a <ul> tag in the shadow-dom which lists all items in the dropdown list.
remDr$executeScript("return document.querySelector('tm-quick-select-bar').setAttribute('dropdown-visible', 'countries')")
Sys.sleep(5)


wrapper <- remDr$findElement(using="tag name", value="tm-quick-select-bar")

script <- paste0(
  'return document.querySelector("#main > header > div.quick-select-wrapper > tm-quick-select-bar")',
  '.shadowRoot.querySelector("div > tm-quick-select:nth-child(2) > div > div.selector-dropdown > ul")'
  '.innerHTML;')

test <- remDr$executeScript(script)

html <- rvest::read_html(test[[1]])

rvest::html_text(html)

# " Afghanistan Albania Algeria American Samoa American .....

CodePudding user response:

I don't know R, but for example:

let shadowEls = [...document.querySelectorAll('*')].filter(el => el.shadowRoot)
return shadowEls[0].shadowRoot.innerHTML

That should be enough to figure this bit out.

  • Related