I'm new to R Studio and biting off more than I can chew :)
I'm trying to use R Studio to import HTML tables from multiple web pages. There are thousands of tables, each with a unique URL, but the URLs all follow the same logic, only varying by location search string, year, and month. Below are a few examples:
https://sunrise-sunset.org/search?location=mar del plata&year=2021&month=10#calendar
https://sunrise-sunset.org/search?location=bendigo victoria&year=1969&month=7#calendar
https://sunrise-sunset.org/search?location=parkville missouri usa&year=2025&month=2#calendar
I've tried using paste0()
and c()
to write a series [of URLs] that I can then use to import the data to R Studio:
URLs <- paste0("https://sunrise-sunset.org/search?location=parkville missouri usa&year=",c(1969:2031),"&month=",c(1:12),"#calendar")
However, by using two separate instances of c()
for Year and Month, the sequences generated are independent of one another and I end up with 60 different URLs instead of 750. Is there a way to generate a series of Months, 1:12
, for each year in a series of Years, 1969:2031
, using paste0()
? Is this even the best approach for what I'm trying to accomplish? And if so, is there also a way to generate this series of Years and Months for multiple locations as well?
CodePudding user response:
One option using expand.grid
to create a dataframe and apply
to collapse all rows into a single string.
base_url <- 'https://sunrise-sunset.org/search?location=parkville missouri usa&'
year_url <- paste0("year=",c(1969:2031))
mon_url <- paste0("&month=",c(1:12),"#calendar")
out_url <- apply(expand.grid(base_url, year_url, mon_url), 1, paste, collapse = '')
length(out_url)
#> [1] 756
head(out_url)
#> [1] "https://sunrise-sunset.org/search?location=parkville missouri usa&year=1969&month=1#calendar"
#> [2] "https://sunrise-sunset.org/search?location=parkville missouri usa&year=1970&month=1#calendar"
#> [3] "https://sunrise-sunset.org/search?location=parkville missouri usa&year=1971&month=1#calendar"
#> [4] "https://sunrise-sunset.org/search?location=parkville missouri usa&year=1972&month=1#calendar"
#> [5] "https://sunrise-sunset.org/search?location=parkville missouri usa&year=1973&month=1#calendar"
#> [6] "https://sunrise-sunset.org/search?location=parkville missouri usa&year=1974&month=1#calendar"
Created on 2021-10-07 by the reprex package (v2.0.0)
Or a different option using rep
to repeat the shorter vectors (base_url
and mon_url
) the same number of times in the longest vector (year_url
)
paste0(rep(base_url, each = length(year_url)),
year_url,
rep(mon_url, each = length(year_url)))