How do I use rvest with an element id name that contains a forward slash?-CodePudding

I'm trying to use rvest to screen scrape an an element id that contains a forward slash. It seems that everything I try as an escape character fails. Suppose that the element I'm trying to select is

<div id ="hello/world"> Some stuff </div>

Using rvest functions, after reading the webpage into a variable called "html", I'm running things like this:

x <- html %>% 
  html_elements("#hello//world")

I've done it using no escape character, different escape characters, etc. But everything I try generates the error:

Error in tokenize(css) : Unexpected character '/' found at position 8.

Any ideas? Big thanks for any help.

CodePudding user response：

It appears that you are searching for an id attribute not an element. Perhaps you can try instead:

x <- html %>% 
  html_elements(xpath = "//div[@id='hello/world']")

CodePudding user response：

I think you should be using html_nodes ?

library(rvest)

html <- read_html('<div id="hello/world"> Some stuff </div>')
html %>% 
  html_nodes("div[id='hello/world']")

Result:

{xml_nodeset (1)}
[1] <div id="hello/world"> Some stuff </div>