Home > Blockchain >  How to combine multiple .txt files with different # of rows in R and keep file names?
How to combine multiple .txt files with different # of rows in R and keep file names?

Time:05-26

The goal is to combine multiple .txt files with single column from different subfolders then cbind to one dataframe (each file will be one column), and keep file names as columne value, an example of the .txt file:

       0.348107
       0.413864
       0.285974
       0.130399
       ...

My code:

#list all the files in the folder
listfile<- list.files(path="",
                      pattern= "txt",full.names = T, recursive = TRUE) #To include sub directories, change the recursive = TRUE, else FALSE.

#extract the files with folder name aINS
listfile_aINS <- listfile[grep("aINS",listfile)]

#inspect file names
head(listfile_aINS)

#combined all the text files in listfile_aINS and store in dataframe 'Data'
for (i in 1:length(listfile_aINS)){
  if(i==1){
    assign(paste0("Data"), read.table(listfile[i],header = FALSE, sep = ","))
  }
  
  if(!i==1){
    
    assign(paste0("Test",i), read.table(listfile[i],header = FALSE, sep = ","))
    Data <- cbind(Data,get(paste0("Test",i))) #choose one: cbind, combine by column; rbind, combine by row
    rm(list = ls(pattern = "Test"))
  }
}

rm(list = ls(pattern = "list. ?"))

I ran into two problems:

  1. R returns this error because the .txt files have different # of rows.

"Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 37, 36"

I have too many files so I hope to work around the error without having to fix the files into the same length.

  1. my code won't keep file name as the column name

CodePudding user response:

It will be easier to write a function and then rbind() the data from each file. The resulting data frame will have a file column with the filename from the listfile_aINS vector.

read_file <- function(filename) {
  dat <- read.table(filename,header = FALSE, sep = ",")
  dat$file <- filename
  return(dat)
}

all_dat <- do.call(rbind, lapply(listfile_aINS, read_file))

If they don't all have the same number of rows it might not make sense to have each column be a file, but if you really want that you could make it into a wide dataset with NA filling out the empty rows:

library(dplyr)
library(tidyr)
all_dat %>% 
  group_by(file) %>% 
  mutate(n = 1:n()) %>% 
  pivot_wider(names_from = file, values_from = V1)
  •  Tags:  
  • r
  • Related