Home > database >  How do I use rvest with an element id name that contains a forward slash?
How do I use rvest with an element id name that contains a forward slash?

Time:08-09

I'm trying to use rvest to screen scrape an an element id that contains a forward slash. It seems that everything I try as an escape character fails. Suppose that the element I'm trying to select is

<div id ="hello/world"> Some stuff </div>

Using rvest functions, after reading the webpage into a variable called "html", I'm running things like this:

x <- html %>% 
  html_elements("#hello//world")

I've done it using no escape character, different escape characters, etc. But everything I try generates the error:

Error in tokenize(css) : Unexpected character '/' found at position 8.

Any ideas? Big thanks for any help.

CodePudding user response:

It appears that you are searching for an id attribute not an element. Perhaps you can try instead:

x <- html %>% 
  html_elements(xpath = "//div[@id='hello/world']")

CodePudding user response:

I think you should be using html_nodes ?

library(rvest)

html <- read_html('<div id="hello/world"> Some stuff </div>')
html %>% 
  html_nodes("div[id='hello/world']")

Result:

{xml_nodeset (1)}
[1] <div id="hello/world"> Some stuff </div>
  • Related