For one url the code works, but for multiple urls in a list this does not work, gives an error. I'm new to r, please help.
library(rvest)
for (url in data_list){
webpage = read_html(url)
extracted_urls = webpage %>%
rvest::html_nodes("a") %>%
rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}
Error:
x
must be a string of length 1
Edit
Links in OP's comment.
data_list <- c(
"ephsports.williams.edu",
"wilsonphoenix.com",
"wingatebulldogs.com",
"ycpspartans.com"
)
CodePudding user response:
Variables created in a for loop get overwritten each iteration. Here, extracted_urls gets repeatedly clobbered. Creating a receiver object outside the loop (try r <- list()) permits adding results stepwise to an object in the global environment, which will remain accessible outside the local environment within the for loop.
CodePudding user response:
As some of the urls are not working, we can skip them using possibly
function.
library(rvest)
library(tidyverse)
data_list <- c(
'https://wilsonphoenix.com',
'https://wingatebulldogs.com',
'https://ycpspartans.com/sorry.ashx'
)
#the third link is broken
# we create a function to get required info.
roster = function(x){
webpage = read_html(x)
extracted_urls = webpage %>%
rvest::html_nodes("a") %>%
rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}
}
Now we loop over vector containing urls data_list
and skipping the one with errors.
df <- map(data_list,
possibly(roster, otherwise = NA_character_))