Home > Net >  (R) Webscraping Error : arguments imply differing number of rows: 1, 0
(R) Webscraping Error : arguments imply differing number of rows: 1, 0

Time:09-13

I am working with the R programming language.

In a previous question (enter image description here

a = "https://www.yellowpages.ca/search/si/"

b = "/pizza/Canada"

list_results = list()

for (i in 1:391)

{

url_i = paste0(a,i,b)

s_i = data.frame(scraper(url_i))

ss_i = data.frame(i,s_i)

print(ss_i)
list_results[[i]] <- ss_i


}

final = do.call(rbind.data.frame, list_results)

My Problem: I noticed that after the 60th page, I get the following error:

Error in data.frame(i, s_i) : 
  arguments imply differing number of rows: 1, 0
In addition: Warning message:
In for (i in seq_along(specs)) { :
  closing unused connection 

To investigate, I went to the 60th page (enter image description here

My Question: Is there something that I can do differently to try and move past the 60th page, or is there some internal limitation within YellowPages that is preventing from me scraping further?

Thanks!

CodePudding user response:

This is a limit in the Yellow Pages preventing to continue to the next page. A solution is to assign the return value of scraper and check the number of rows. If it is 0, break the for loop.

a = "https://www.yellowpages.ca/search/si/"
b = "/pizza/Canada"
list_results <- list()

for (i in 1:391) {
  url_i = paste0(a,i,b)
  
  s <- scraper(url_i, i)
  message(paste("page number:", i, "\trows:", nrow(s)))
  if(nrow(s) > 0L) {
    s_i <- as.data.frame(s)
    ss_i <- data.frame(i, s_i)
  } else {
    message("empty page, bailing out...")
    break
  }
  list_results[[i]] <- ss_i
}

final <- do.call(rbind.data.frame, list_results)
dim(final)
# [1] 2100    3
  • Related