I'm trying to use rvest to screen scrape an an element id that contains a forward slash. It seems that everything I try as an escape character fails. Suppose that the element I'm trying to select is
<div id ="hello/world"> Some stuff </div>
Using rvest functions, after reading the webpage into a variable called "html", I'm running things like this:
x <- html %>%
html_elements("#hello//world")
I've done it using no escape character, different escape characters, etc. But everything I try generates the error:
Error in tokenize(css) : Unexpected character '/' found at position 8.
Any ideas? Big thanks for any help.
CodePudding user response:
It appears that you are searching for an id attribute not an element. Perhaps you can try instead:
x <- html %>%
html_elements(xpath = "//div[@id='hello/world']")
CodePudding user response:
I think you should be using html_nodes
?
library(rvest)
html <- read_html('<div id="hello/world"> Some stuff </div>')
html %>%
html_nodes("div[id='hello/world']")
Result:
{xml_nodeset (1)}
[1] <div id="hello/world"> Some stuff </div>