My question is very similar to this one. I want to download all Excel files (.xlsx) from this webpage. But the difference is (I think) that I do not have the same pattern as used in the example. I have tried several variations with no result. Any idea how to download these files? Also, if you can show how I can download them directly into a dataframe (without downloading them to my PC first) that would be appreciated.
CodePudding user response:
A simple way to download the excel files, one step at a time.
First, get the links.
library(rvest)
url <- "https://www.fondbolagen.se/fakta_index/statistik/"
read_html(url) |>
html_elements("p") |>
html_elements("a") |>
html_attr("href") |>
(\(x) grep("\\.xls", x, value = TRUE))() |>
(\(x) sprintf("http://www.fondbolagen.se%s", x))() -> excel_links
Now, use the code in this Rich Scriven post to download the files. I have omitted the files creation instruction.
dir.create("myexcel")
## save the current directory path for later
wd <- getwd()
## change working directory for the download
setwd("myexcel")
## download them all
lapply(excel_links, \(x) download.file(x, basename(x)))
## reset working directory to original
setwd(wd)