Rvest html_nodes span div other items-CodePudding

I'm scrapping through this html and I want to extract the text inside the <span data-testid="distance">

<span >
<span data-testid="distance">the text i want</span>
</span>
<span >
<span ><span>the other text i'm obtaining</span>
</span>

distancia <- hoteles_verdes %>% 
  html_elements("span.class1") %>%
  html_text()

The question would be how to isolate the data-testid="distance" on the html elements to later retrieve the html_text.

It's my first question posting. thanks!

CodePudding user response：

You can use a CSS attribute selector.

For example, the [attribute|="value"] selector to select attribute "data-testid" with value = "distance" (note the single and double quotes):

library(rvest)

hoteles_verdes %>% 
  html_nodes('[data-testid|="distance"]') %>% 
  html_text()

Result:

[1] "the text i want"

Data:

hotel_verdes <- read_html('<span >
                           <span data-testid="distance">the text i want</span>
                           </span>
                           <span >
                           <span ><span>the other text im obtaining</span>
                           </span>')