Home > other >  How can I fix this issue in R with webscraping?
How can I fix this issue in R with webscraping?

Time:03-10

I am trying to pull across data from within over 800 links and putting it onto a table. I have tried using chrome selector gadget but cannot work out how to get it to loop. I must have spent 40 hours and keep getting error codes. I need to pull the same information from li:nth-child(8) , li:nth-child(8) strong and another couple text boxes of information. I have tried following a YouTube video and I just changed the names and links but otherwise maintained consistency and it just will not work.

library(tidyverse)
library(rvest)
library(htmltools)
library(xml2)
library(dplyr)

results <- read_html("https://www.artemis.bm/deal-directory/")

issuers <- results %>% html_nodes("#table-deal a") %>% html_text()


url <- results %>% html_nodes("#table-deal a") %>% html_attr("href")

get_modelling = function(url_link) {
  issuer_page = read_html(url_link)
   modelling = issuer_page %>% html_nodes("#info-box li:nth-child(4)") %>%
     html_text()
  return(modelling)
}

issuer_modelling = sapply(url, FUN = get_modelling) 

I get these issues:

Warning message:
In for (i in seq_along(specs)) { :
  closing unused connection 4 (https://www.artemis.bm/deal-directory/bellemeade-re-2022-1-ltd/)

Called from: open.connection(x, "rb")
Browse[1]> data.table::data.table(placement = unlist(issue_placement))[,.N, placement]
Error during wrapup: object 'issue_placement' not found
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Browse[1]> c
> data.table::data.table(placement = unlist(issue_placement))[,.N, placement]
Error in unlist(issue_placement) : object 'issue_placement' not found

CodePudding user response:

We can use simple for loop,

#create empty vector
df = c()

for(i in head(url)){ 
  
  dd = i %>% read_html() %>% html_nodes("#info-box li:nth-child(4)") %>%
    html_text()
  df = c(dd, df)
}

 df
[1] "Risk modelling / calculation agents etc: AIR Worldwide" "Risk modelling / calculation agents etc: AIR Worldwide"
[3] "Risk modelling / calculation agents etc: RMS"           "Risk modelling / calculation agents etc: AIR Worldwide"
[5] "Risk modelling / calculation agents etc: AIR Worldwide" "Risk modelling / calculation agents etc: AIR Worldwide"
  • Related