Rvest not returning html_nodes whan id of xpath has an accent in R-CodePudding

I am trying to scrap a table from an html file using Rvest in R. But html_node is not working, I think it is because the id in the xpath es in Spanish and has an accent.

Here is the code:

library(rvest)
library(xml2)

url <- "https://www3.ine.gub.uy/boletin/Boletin Ingresos 4to trimestre 2021.html"
html <- read_html(url)
data <- html_node(html, xpath='//*[@id="ingreso-medio-per-cápita"]/table/tbody')

I have been googling a lot but I cannot find a solution. I would really appreciate if someone could help me!

CodePudding user response：

I'm not sure what the problem is here, but since the string up to the accented character is still unique, you can get it using the xpath function starts-with

library(rvest)
library(xml2)

url <- "https://www3.ine.gub.uy/boletin/Boletin Ingresos 4to trimestre 2021.html"
html <- read_html(url)

xpath <- '//div[starts-with(@id,"ingreso-medio-per-c")]/table'
data <- html_table(html_nodes(html, xpath = xpath))[[1]][1:3,]
#> Warning in table_fill(cells, trim = trim): NAs introduced by coercion

data
#> # A tibble: 3 x 3
#>   ``         `Trimestre 3 2021` `Trimestre 4 2021`
#>   <chr>                   <dbl>              <dbl>
#> 1 Total país               25.8               26.6
#> 2 Montevideo               32.5               33.5
#> 3 Interior                 21.5               22.3

^{Created on 2022-02-15 by the reprex package (v2.0.1)}

CodePudding user response：

Or you can use,

library(rvest)
library(tidyverse)

url = 'https://www3.ine.gub.uy/boletin/Boletin Ingresos 4to trimestre 2021.html'

url %>% 
  read_html() %>% 
  html_table()

to get all the tables from webpage.