I'm trying to download an xlsx file with the code below:
library(rvest)
file <- "tesouro.csv"
site <- read_html("https://www.tesourotransparente.gov.br/publicacoes/boletim-resultado-do-tesouro-nacional-rtn/")
link <- site %>% html_nodes(xpath="//a[contains(text(), 'serie_historica_jun22.xlsx')]") %>% html_attr("href")
download.file(
url = link,
mode = "w", destfile = file
)
But the download is an empty xlsx with the html code inside the spreadsheet:
<html>
<head>
<script>
var isMobile = /iPhone|iPad|iPod|Android/i.test(navigator.userAgent)
if(isMobile) {
window.location = "https://cdn.tesouro.gov.br/sistemas-internos///apex//producao//sistemas//thot//arquivos//publicacoes/44179_1398982/anexos/16848_586003///serie_historica_jun22.xlsx?v=7366"
}
</script>
</head>
<frameset COLS="*" border=0>
<frame SRC="https://cdn.tesouro.gov.br/sistemas-internos///apex//producao//sistemas//thot//arquivos//publicacoes/44179_1398982/anexos/16848_586003///serie_historica_jun22.xlsx?v=7366" frameborder=0>
</frameset>
</html>```
CodePudding user response:
You need to follow the link in the frame. Also, it's an xlsx, so you need to save it as one, not as a csv:
library(rvest)
site <- read_html(paste0("https://www.tesourotransparente.gov.br/publicacoes/",
"boletim-resultado-do-tesouro-nacional-rtn/"))
site %>%
html_nodes(xpath="//a[contains(text(), 'serie_historica_jun22.xlsx')]") %>%
html_attr("href") %>%
read_html() %>%
html_nodes(xpath = "//frameset/frame") %>%
html_attr("src") %>%
httr::GET() %>%
httr::content("raw") %>%
writeBin("tesouro.xlsx")
Now we have
tesouro.xlsx