I'm scrapping through this html and I want to extract the text inside the <span data-testid="distance">
<span >
<span data-testid="distance">the text i want</span>
</span>
<span >
<span ><span>the other text i'm obtaining</span>
</span>
distancia <- hoteles_verdes %>%
html_elements("span.class1") %>%
html_text()
The question would be how to isolate the data-testid="distance" on the html elements to later retrieve the html_text.
It's my first question posting. thanks!
CodePudding user response:
You can use a CSS attribute selector.
For example, the [attribute|="value"] selector to select attribute
"data-testid" with value
= "distance" (note the single and double quotes):
library(rvest)
hoteles_verdes %>%
html_nodes('[data-testid|="distance"]') %>%
html_text()
Result:
[1] "the text i want"
Data:
hotel_verdes <- read_html('<span >
<span data-testid="distance">the text i want</span>
</span>
<span >
<span ><span>the other text im obtaining</span>
</span>')