Home > Blockchain >  iterate over a list of url in r
iterate over a list of url in r

Time:09-28

i want to import a txt that have a list of urls and extract from each one and save that in a cvs file but i get stuck

First i import the txt no problem but when a i want to iterate over each row i just extrat from the first one

library(rvest)
library(tidyr)
library(dplyr)

for(i in seq(list_url)) {
    text <- read_html(list_url$url[i]) %>%html_nodes("tr~ tr  tr strong") %>%html_text()}

i just get the result from the first url in a value form , i want a dataframe of all the the extract from the urls

edit : the list_ url file is full with this urls..

http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=_b8I7G9olKAukGNlsRE6RHSYaYPu8YLjhTEW15HEuj4. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=ewwF4WmHAnOkCg8Y_XIFH705H_O5hJL9uB5hztOhrsE. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=Z9BDo7ACNDbsUwTiVFTe9aKFfcLAxxnU2AtL6DCloX4. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=NZPRa9SoKHVJQcZ64_4zVgcLSNKmHZ4MtorPu23MUPg.

CodePudding user response:

Are you sure it is the result of the first URL you get in the text variable? It should be the last as with every cycle the for loop overwrites the value in text.

lapply() is perfect for this and avoids the issues that come with for-loops.

This does what you are trying to achieve:

text <- 
  lapply(list_url$url,
         \(x) read_html(x) %>% 
           html_nodes("tr~ tr  tr strong") %>% 
           html_text())

Using sapply() instead you'll get a vector as a result instead of a list. Which might be helpful for the following steps. You might also want to look up purrr, it provides a suite of *apply() like functions.

CodePudding user response:

You should create an output object, then populate every element "i" of that output object with your function. As is, your code is just overwriting all the intermediate objects to the same output object.

library(rvest)
library(tidyr)
library(dplyr)

text<-vector('list', length=length(list_url)) #create the output object
for(i in seq(list_url)) {
    text[[i]] <- read_html(list_url$url[i]) %>%html_nodes("tr~ tr  tr strong") %>%html_text()}
text
  • Related