Home > Back-end >  Storing the output of a for loop in a container / object
Storing the output of a for loop in a container / object

Time:08-09

This is more theoretical as I was not able to construct a reproducible example. But after many hours I need your help.

1. I have a folder with 3 pdf files

This code does what I want. It prints the 3 tables combined in one to the screen.

library(pdftools)
library(here)

 pdf_files <- list.files(here("pdf_xxx"), pattern=".pdf")

for (i in 1:length(pdf_files)) {

  PDF <- pdf_text(paste(here("pdf_xxx"), pdf_files[i], sep="/")) %>% 
    readr::read_lines() 
  
print(PDF)
}

My question is how can I store this output in a dataframe:

For this I used this code:


# create empty dataframe results
results <- data.frame(text = character(length(pdf_files)), stringsAsFactors = FALSE)

for (i in 1:length(pdf_files)) {
  
  PDF <- pdf_text(paste(here("pdf_xxx"), pdf_files[i], sep="/")) %>% 
    readr::read_lines() 
  results$text[i] <- PDF[i]
}

results

Here I only get 3 rows. This is because length(pdf_files) is 3 ?!

enter image description here

How could I store the print output which looks like this in a dataframe:

.....continuied. enter image description here

Update the print(PDF) in the for loop (here with 2 pdf files) gives this (and I want to save this to an object):

for (i in 1:length(pdf_files)) {
  
  PDF <- pdf_text(paste(here("pdf_dienstplan"), pdf_files[i], sep="/")) %>% 
    readr::read_lines() 
  print(PDF)
}

[1] "     blabla                                  
[2] "     blabla                        
[3] ""                                                   
[4] "     Datum        
[5] "Sa     01.10.2022 
[6] ""                 
[7] "So     02.10.2022 
[8] ""                 
[9] "Mo     03.10.2022 
[10] ""                
[11] "Di     04.10.2022
[12] ""                
[13] "Mi     05.10.2022
[14] ""                
[15] "Do     06.10.2022
[16] ""                
[17] "Fr     07.10.2022
[18] ""                
[19] "Sa     08.10.2022
[20] ""                
[21] "So     09.10.2022
[22] ""                
[23] "Mo     10.10.2022
[24] ""                
[25] "Di     11.10.2022
[26] ""                
[27] "Mi     12.10.2022
[28] ""                
[29] "Do     13.10.2022
[30] ""                
[31] "Fr     14.10.2022
[32] ""                
[33] "Sa     15.10.2022
[34] ""                
[35] "So     16.10.2022
[36] ""                
[37] "Mo     17.10.2022
[38] ""                
[39] "Di     18.10.2022
[40] ""                
[41] "Mi     19.10.2022
[42] ""                
[43] "Do     20.10.2022
[44] ""                
[45] "Fr     21.10.2022
[46] ""                
[47] "Sa     22.10.2022
[48] ""                
[49] "So     23.10.2022
[50] ""                
[51] "Mo     24.10.2022
[52] ""                
[53] "Di     25.10.2022
[54] ""                
[55] "Mi     26.10.2022
[56] ""                
[57] "Do     27.10.2022
[58] ""                
[59] "Fr     28.10.2022
[60] ""                
[61] "Sa     29.10.2022
[62] ""                
[63] "So     30.10.2022
[64] ""                
[65] "Mo     31.10.2022
[1] "     blabla   
[2] "     blabla
[3] ""                 
[4] "     Datum        
[5] "Do     01.09.2022 
[6] ""                 
[7] "Fr     02.09.2022 
[8] ""                 
[9] "Sa     03.09.2022 
[10] ""                
[11] "So     04.09.2022
[12] ""                
[13] "Mo     05.09.2022
[14] ""                
[15] "Di     06.09.2022
[16] ""                
[17] "Mi     07.09.2022
[18] ""                
[19] "Do     08.09.2022
[20] ""                
[21] "Fr     09.09.2022
[22] ""                
[23] "Sa     10.09.2022
[24] ""                
[25] "So     11.09.2022
[26] ""                
[27] "Mo     12.09.2022
[28] ""                
[29] "Di     13.09.2022
[30] ""                
[31] "Mi     14.09.2022
[32] ""                
[33] "Do     15.09.2022
[34] ""                
[35] "Fr     16.09.2022
[36] ""                
[37] "Sa     17.09.2022
[38] ""                
[39] "So     18.09.2022
[40] ""                
[41] "Mo     19.09.2022
[42] ""                
[43] "Di     20.09.2022
[44] ""                
[45] "Mi     21.09.2022
[46] ""                
[47] "Do     22.09.2022
[48] ""                
[49] "Fr     23.09.2022
[50] ""                
[51] "Sa     24.09.2022
[52] ""                
[53] "So     25.09.2022
[54] ""                
[55] "Mo     26.09.2022
[56] ""                
[57] "Di     27.09.2022
[58] ""                
[59] "Mi     28.09.2022
[60] ""                
[61] "Do     29.09.2022
[62] ""                
[63] "Fr     30.09.2022

CodePudding user response:

That is because you're indexing PDF[i] so you only keep the ith element. And yes, there are only 3 pdf files since result has 3 rows.

The below should work.

res_list <- list()
for (i in 1:length(pdf_files)) {
  
  PDF <- pdf_text(paste(here("pdf_xxx"), pdf_files[i], sep="/")) %>% 
    readr::read_lines() 
  res_list[[i]] <- PDF
}

result <- as.data.frame(res_list)

CodePudding user response:

First of all, thanks to all who helped me. I have changed my strategy:

With pdf_combine from qpdf package I combine all pdf (here 3) in my folder to one pdf (combined) pdf.

Then I use pdf_text() from pdftools package with read_lines.

Here is the code:

library(qpdf)
library(pdftools)
library(tidyverse)

pdf_files <- list.files(here("pdf_xxx"), pattern=".pdf", full.names = TRUE)

my_path <- "pdf_xxx/combined_xxx.pdf"

pdf_combine(input = pdf_files,
            output = my_path )

PDF <- pdf_text(my_path) %>% 
  readr::read_lines() 
  • Related