I have a list of files that only contain one number. I want to combine all files into a data frame with one column containing the filename and one column the corresponding number for that file. I tried the below, but reading the files failed.
Example for single file that works:
> read.csv(file="file1.stats",check.names = F)
[1] 2659344201
<0 rows> (or 0-length row.names)
> read.csv(file="file2.stats",check.names = F)
[1] 92424242
<0 rows> (or 0-length row.names)
Combining does not work:
file_list = list.files(pattern=".stats")
datalist = lapply(file_list, function(x){
dat = read.csv(file=x,check.names = F)
})
error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input
joined <- join_all(dfs = datalist,by = "V1",type ="full" )
CodePudding user response:
The following should work, though not tested as I don't have your files.
library(data.table)
file_list = list.files(pattern=".stats")
data_table = rbindlist(lapply(file_list, function(x){
fread(file=x)
}))
rbindlist
will flatten your list, without going through the hassle of join.
CodePudding user response:
A solution based in purrr::map_dfr
:
library(tidyverse)
# create 10 csv files in the /tmp directory
walk(1:10, ~ write(sample(1111111:9999999,1), paste0("/tmp/file",.x,".csv")))
# gets the names of the files
files <- dir("/tmp/","*.csv")
map_dfr(files, ~ data.frame(fname = .x, read.csv(paste0("/tmp/",.x), header = F)))
#> fname V1
#> 1 file1.csv 6803283
#> 2 file10.csv 4835472
#> 3 file2.csv 2645034
#> 4 file3.csv 9766210
#> 5 file4.csv 8570853
#> 6 file5.csv 7384528
#> 7 file6.csv 7609801
#> 8 file7.csv 1244294
#> 9 file8.csv 5098257
#> 10 file9.csv 4940697
Alternatively, using dplyr
:
library(tidyverse)
# create 10 csv files in the /tmp directory
walk(1:10, ~ write(sample(1111111:9999999,1), paste0("/tmp/file",.x,".csv")))
# gets the names of the files
files <- dir("/tmp/","*.csv")
files %>%
data.frame %>% setNames("fnames") %>%
rowwise() %>% mutate(read.csv(paste0("/tmp/",fnames), header = F))
#> # A tibble: 10 × 2
#> # Rowwise:
#> fnames V1
#> <chr> <int>
#> 1 file1.csv 3484087
#> 2 file10.csv 9333635
#> 3 file2.csv 1455252
#> 4 file3.csv 9665802
#> 5 file4.csv 8401813
#> 6 file5.csv 5864912
#> 7 file6.csv 9494831
#> 8 file7.csv 5230778
#> 9 file8.csv 9717400
#> 10 file9.csv 9761327