Home > Software engineering >  What is the mistake in this URL loop?
What is the mistake in this URL loop?

Time:03-17

For one url the code works, but for multiple urls in a list this does not work, gives an error. I'm new to r, please help.

library(rvest)


for (url in data_list){

webpage = read_html(url)


extracted_urls = webpage %>%
rvest::html_nodes("a") %>%
rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}

Error:

x must be a string of length 1


Edit

Links in OP's comment.

data_list <- c(
  "ephsports.williams.edu", 
  "wilsonphoenix.com", 
  "wingatebulldogs.com", 
  "ycpspartans.com"
)

CodePudding user response:

Variables created in a for loop get overwritten each iteration. Here, extracted_urls gets repeatedly clobbered. Creating a receiver object outside the loop (try r <- list()) permits adding results stepwise to an object in the global environment, which will remain accessible outside the local environment within the for loop.

CodePudding user response:

As some of the urls are not working, we can skip them using possibly function.

library(rvest)
library(tidyverse)

data_list <- c(
  'https://wilsonphoenix.com', 
 'https://wingatebulldogs.com',
'https://ycpspartans.com/sorry.ashx'
)
#the third link is broken 

# we create a function to get required info. 
roster = function(x){ 
webpage = read_html(x)
extracted_urls = webpage %>%
  rvest::html_nodes("a") %>%
  rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}
}

Now we loop over vector containing urls data_list and skipping the one with errors.

df <- map(data_list, 
                  possibly(roster, otherwise = NA_character_)) 
                  
  • Related