I am trying to scrap a table from an html file using Rvest in R. But html_node is not working, I think it is because the id in the xpath es in Spanish and has an accent.
Here is the code:
library(rvest)
library(xml2)
url <- "https://www3.ine.gub.uy/boletin/Boletin Ingresos 4to trimestre 2021.html"
html <- read_html(url)
data <- html_node(html, xpath='//*[@id="ingreso-medio-per-cápita"]/table/tbody')
I have been googling a lot but I cannot find a solution. I would really appreciate if someone could help me!
CodePudding user response:
I'm not sure what the problem is here, but since the string up to the accented character is still unique, you can get it using the xpath function starts-with
library(rvest)
library(xml2)
url <- "https://www3.ine.gub.uy/boletin/Boletin Ingresos 4to trimestre 2021.html"
html <- read_html(url)
xpath <- '//div[starts-with(@id,"ingreso-medio-per-c")]/table'
data <- html_table(html_nodes(html, xpath = xpath))[[1]][1:3,]
#> Warning in table_fill(cells, trim = trim): NAs introduced by coercion
data
#> # A tibble: 3 x 3
#> `` `Trimestre 3 2021` `Trimestre 4 2021`
#> <chr> <dbl> <dbl>
#> 1 Total país 25.8 26.6
#> 2 Montevideo 32.5 33.5
#> 3 Interior 21.5 22.3
Created on 2022-02-15 by the reprex package (v2.0.1)
CodePudding user response:
Or you can use,
library(rvest)
library(tidyverse)
url = 'https://www3.ine.gub.uy/boletin/Boletin Ingresos 4to trimestre 2021.html'
url %>%
read_html() %>%
html_table()
to get all the tables from webpage.