Home > Enterprise >  Combining multiple files containing only one number
Combining multiple files containing only one number

Time:11-07

I have a list of files that only contain one number. I want to combine all files into a data frame with one column containing the filename and one column the corresponding number for that file. I tried the below, but reading the files failed.

Example for single file that works:

> read.csv(file="file1.stats",check.names = F)
[1] 2659344201
<0 rows> (or 0-length row.names)


> read.csv(file="file2.stats",check.names = F)
[1] 92424242
<0 rows> (or 0-length row.names)

Combining does not work:

file_list = list.files(pattern=".stats")    
datalist = lapply(file_list, function(x){
  dat = read.csv(file=x,check.names = F)
})

error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input

 joined <- join_all(dfs = datalist,by = "V1",type ="full" )  

CodePudding user response:

The following should work, though not tested as I don't have your files.

library(data.table)
file_list = list.files(pattern=".stats")  
data_table = rbindlist(lapply(file_list, function(x){
  fread(file=x)
}))

rbindlist will flatten your list, without going through the hassle of join.

CodePudding user response:

A solution based in purrr::map_dfr:

library(tidyverse)

# create 10 csv files in the /tmp directory
walk(1:10, ~ write(sample(1111111:9999999,1), paste0("/tmp/file",.x,".csv")))

# gets the names of the files
files <- dir("/tmp/","*.csv")

map_dfr(files, ~ data.frame(fname = .x, read.csv(paste0("/tmp/",.x), header = F)))

#>         fname      V1
#> 1   file1.csv 6803283
#> 2  file10.csv 4835472
#> 3   file2.csv 2645034
#> 4   file3.csv 9766210
#> 5   file4.csv 8570853
#> 6   file5.csv 7384528
#> 7   file6.csv 7609801
#> 8   file7.csv 1244294
#> 9   file8.csv 5098257
#> 10  file9.csv 4940697

Alternatively, using dplyr:

library(tidyverse)

# create 10 csv files in the /tmp directory
walk(1:10, ~ write(sample(1111111:9999999,1), paste0("/tmp/file",.x,".csv")))

# gets the names of the files
files <- dir("/tmp/","*.csv")

files %>% 
  data.frame %>%  setNames("fnames") %>% 
  rowwise() %>% mutate(read.csv(paste0("/tmp/",fnames), header = F))

#> # A tibble: 10 × 2
#> # Rowwise: 
#>    fnames          V1
#>    <chr>        <int>
#>  1 file1.csv  3484087
#>  2 file10.csv 9333635
#>  3 file2.csv  1455252
#>  4 file3.csv  9665802
#>  5 file4.csv  8401813
#>  6 file5.csv  5864912
#>  7 file6.csv  9494831
#>  8 file7.csv  5230778
#>  9 file8.csv  9717400
#> 10 file9.csv  9761327
  •  Tags:  
  • r
  • Related