Home > Back-end >  Storing JSON in a List
Storing JSON in a List

Time:09-26

I have a series of websites that I want to scrape with the same format (e.g. www.website1.com, www.website2.com, www.website3.com, etc.)

I also have the following code that I am using for webscraping the JSON from each website - I want to use a "timeout" function that forces the loop to skip to the next iteration if a certain amount of time has elapsed (e.g. 1 second):

library(R.utils)
library(jsonlite)
    
res <- list()

   part1 = "https://website"

part2 = "extension="

part3 = ".com"


res <- list()

for (i in 1:10) {
  print(i)
  tryCatch({
    res[[i]] <- withTimeout({
      url_i <- paste0(part1, i 1,  part2, i,  part3)
      r_i <- data.frame(fromJSON(url_i))
      res[[i]] <- r_i
      print(i)
      }, timeout=1)
  }, 
  TimeoutException=function(ex) {
    message("Timeout. Skipping.")
    res[[i]] <- NULL
  })
}

The loop seems to run - but the results are empty:

> res
[[1]]
NULL

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

[[7]]
[1] 7

[[8]]
[1] 8

[[9]]
[1] 9

[[10]]
[1] 10

However, the "r_i" objects seem to have run. For example, if I inspect "r_i", its producing a valid result.

I can't seem to understand why "r_i" is working, but storing the results in a list is not working.

Does anyone have any ideas about this?

CodePudding user response:

The timeout was too low set it higher (maybe 10 is too much). You had two res[[i]] <- in your code. It should be just left of R.utils::withTimeout(). Also the last thing of withTimeout should throw the "data.frame", before it stored just the is from the print which should be one line above. Maybe cat(i, '\r') is better, it doesn't clutter the console and looks somewhat cool.

library(R.utils)
library(jsonlite)

for (i in 1:10) {
  tryCatch({
    res[[i]] <- withTimeout({
      url_i <- paste0(part1, i   1, part2, i, part3)
      cat(i, '\r')
      data.frame(fromJSON(url_i))
      }, timeout=10)
  }, 
  TimeoutException=function(ex) {
    message("Timeout. Skipping.")
    res[[i]] <- NA
  })
}

Not sure which kind of output you exactly want, at least res contains something now.

  • Related