How to loop over on different files and save the output with filename in R?-CodePudding

I have several files with the names RTDFE, TRYFG, FTYGS, WERTS...like 100 files in txt format. For each file, I'm using the following code and writing the output in a file.

name = c("RTDFE")

file1 <- paste0(name, "_filter",".txt")
file2 <- paste0(name, "_data",".txt")
  
### One
  
A <- read.delim(file1, sep = "\t", header = FALSE)

#### two
  
B <- read.delim(file2, sep = "\t", header = FALSE)

C <- merge(A, B, by="XYZ")
nrow(C)
145

Output:

Samples    Common
 RTDFE      145

Every time I'm assigning the file to variable name running my code and writing the output in the file. Instead, I want the code to be run on all the files in one go and want the following output. Common is the row of merged data frame C

The output I need:

Samples    Common
 RTDFE      145
 TRYFG      ...
 FTYGS      ...
 WERTS      ...

How to do this? Any help.

CodePudding user response：

How about putting all your names in a single vector, called names, like this:

names<-c("TRYFG","RTDFE",...)

and then feeding each one to a function that reads the files, merges them, and returns the rows

f<-function(n) {
    fs = paste0(n,c("_filter", "_data"),".txt")
    C = merge(
        read.delim(fs[1],sep="\t", header=F),
        read.delim(fs[2],sep="\t", header=F), by="XYZ")
    data.frame(Samples=n,Common=nrow(C))
}

Then just call call this function f on each of the values in names, row binding the result together

do.call(rbind, lapply(names, f))

An easy way to create the vector names is like this:

p = "_(filter|data).txt"
names = unique(gsub(p,"",list.files(pattern = p)))

CodePudding user response：

I am making some assumptions here. The first assumption is that you have all these files in a folder with no other text files (.txt) in this folder. If so you can get the list of files with the command list.files. But when doing so you will get the "_data.txt" and the "filter.txt". We need a way to extract the basic part of the name. I use "str_replace" to remove the "_data.txt" and the "_filter.txt" from the list. But when doing so you will get a list with two entries. Therefore I use the "unique" command. I store this in "lfiles" that will now contain "RTDFE, TRYFG, FTYGS, WERTS..." and any other file that satisfy the conditions. After this I run a for loop on this list. I reopen the files similarly as you do. I merge by XYZ and I immediately put the results in a data frame. By using rbind I keep adding results to the data frame "res".

library(stringr)

lfiles=list.files(path = ".", pattern = ".txt")

## we strip, from the files, the "_filter and the data
lfiles=unique( sapply(lfiles, function(x){
  x=str_replace(x, "_data.txt", "")
  x=str_replace(x, "_filter.txt", "") 
  return(x)
} ))
  

res=NULL
for(i in lfiles){
  
  file1 <- paste0(i, "_filter.txt")
  file2 <- paste0(i, "_data.txt")
  
  ### One
  
  A <- read.delim(file1, sep = "\t", header = FALSE)
  
  #### two
  
  B <- read.delim(file2, sep = "\t", header = FALSE)
  
  res=rbind(data.frame(Samples=i, Common=nrow(merge(A, B, by="XYZ"))))
  
}

CodePudding user response：

Ok, I will assume you have a folder called "data" with files named "RTDFE_filter.txt, RTDFE_data, TRYFG_filter.txt, TRYFG_data.txt, etc. (only and exacly this files).

This code should give a possible way

# save the file names
files = list.files("data") 

# get indexes for "data" (for "filter" indexes, add 1)
files_data_index = seq(1, length(f), 2) # 1, 3, 5, ...

# loop on indexes
results = lapply(files_data_index, function(i) {
    A <- read.delim(files[i 1], sep = "\t", header = FALSE)
    B <- read.delim(files[i],   sep = "\t", header = FALSE)
    C <- merge(A, B, by="XYZ")

    samp = strsplit(files[i], "_")[[1]][1]
    com  = nrow(C)

    return(c(Samples = samp, Comon = com))
})

# combine results
do.call(rbind, results)