Home > Blockchain >  How to create rownames with name of input files for a loop in R
How to create rownames with name of input files for a loop in R

Time:07-26

I have a loop which goes through a large number of .tsv files in R and creates one output file with the results. Each row in the output file corresponds to the results of processing each input file in turn. I need to look back at the input files and work out which each result in the output file corresponds to. I would therefore like the rownames for the output file (big_data), to be the names of the input tsv files, I have tried this in my loop but not working. Here is my abbreviated loop below which works when I remove the rownames line.

files <- list.files(path =".", pattern = ".tsv")
files
datalist = list()
for(i in 1:length(files)) {  
  other_trait <- read.table(files[i])
  coloc_res = coloc::coloc.abf(dataset1 = other_trait, dataset2 = dataset2,p12 = 1e-5)
  coloc_results=matrix(ncol=6,nrow=1,0)
  coloc_results[1,]=coloc_res$summary
  write.csv(coloc_results, paste0("processed_", basename(files[i])))
  datalist[[i]] = coloc_results
  big_data = do.call(rbind, datalist)
  colnames(big_data)=c("n_snps","H0","H1","H2","H3","H4")
  rownames(big_data)= paste0(basename(files[i]))
  write.csv(big_data, "results.csv")
  
}

The line I am struggling with is rownames(big_data) = paste0 etc...

CodePudding user response:

Assuming coloc_results is of class data.frame

#create list of files
files <- list.files(path =".", pattern = ".tsv")

#create list to bind results to
datalist = list()

#loop through files
for(i in 1:length(files)) { 
  #read table
  other_trait <- read.table(files[i])
  
  #desired analysis
  coloc_res <- coloc::coloc.abf(dataset1 = other_trait, dataset2 = dataset2,p12 = 1e-5)
  coloc_results <- matrix(ncol=6,nrow=1,0)
  coloc_results[1,] <- coloc_res$summary
  
  #write results of analysis to individual file
  write.csv(coloc_results, paste0("processed_", basename(files[i])))

  #add column containing information regarding the inputfile
  coloc_results$inputfile <- basename(files[i])

  #add results of analysis to list
  datalist[[i]] = coloc_results 
}

#merge list to one data.frame
big_data <- do.call(rbind, datalist)

#tid colnames
colnames(big_data) <- c("n_snps","H0","H1","H2","H3","H4", "inputfile)
 
#write to csv
write.csv(big_data, "results.csv")

Note that the do.call(rbind, datalist) is now outside of the for loop. So first all items are added to the list, then the entire list is converted to one big dataframe. In your original code, you were overwritting results.csv in every iteration.

  • Related