Home > Net >  Apply a particular function in all files of a folder using R
Apply a particular function in all files of a folder using R

Time:06-01

I have developed a particular R function named DNAdupstability for some Biological analysis which requires input using as fasta file (.fasta/.txt) which returns a dataframe in this format:

 Sequence Position8 Position9 Position10 Position11 Position12 Position13
1        1 -1.473571 -1.473571  -1.462143  -1.412143  -1.412143  -1.371429
  Position14 Position15 Position16 Position17 Position18 Position19 Position20
1  -1.372143       -1.4  -1.428571  -1.439286  -1.430714  -1.420714  -1.397143

This is a random dataframe and it continues to n positions on the basis of the input sequence. I have a folder named Random_fasta which has 1333 equal length but different fasta sequences. The developed function DNAdupstability gives the desired outcome for a single fasta sequence (the above mentioned dataframe) from the folder Random_fasta, but now I want to carry out analysis of all the other 1332 sequences using the same DNAdupstability function and a form a combined dataframe similar to this format for all the sequences

  Sequence Position8 Position9 Position10 Position11 Position12 Position13
1        1 -1.434286 -1.434286  -1.446429  -1.435714  -1.445714  -1.509286
2        2 -1.522143 -1.492143  -1.463571  -1.435714  -1.492857  -1.544286
3        3 -1.232857 -1.265000  -1.333571  -1.328571  -1.330000  -1.329286
4        4 -1.799286 -1.799286  -1.799286  -1.799286  -1.730714  -1.735714
5        5 -1.547143 -1.507143  -1.535714  -1.530714  -1.478571  -1.450714
  Position14 Position15 Position16 Position17 Position18 Position19 Position20
1  -1.452143  -1.402143  -1.390000  -1.457143  -1.509286  -1.498571  -1.458571
2  -1.544286  -1.544286  -1.544286  -1.544286  -1.601429  -1.715000  -1.755000
3  -1.340000  -1.328571  -1.333571  -1.344286  -1.384286  -1.446429  -1.486429
4  -1.667143  -1.605000  -1.536429  -1.486429  -1.536429  -1.605000  -1.600000
5  -1.450714  -1.450714  -1.412143  -1.372143  -1.434286  -1.531429  -1.615000

So that I could calculate the position-wise mean which will then be further used for some visualization using ggplot2. Is there any way that I could apply the same functions in all the files of the folder particularly using R and get the desired combined dataframe? Any help will be greatly appreciated!

CodePudding user response:

One option is to recursively return all the files from the main folder with list.files, then apply the custom fuction by looping over the files, and convert to a single data.frame with do.call(rbind

files <- list.files('path/to/your/folder', recursive = TRUE, 
  pattern = "\\.txt$", full.names = TRUE)
lst1 <- lapply(files, DNAdupstability)
out <- do.call(rbind, lst1)

Or we can use map from purrr with _dfr to combine all the output from the list to a single data.frame

library(purrr)
out <- map_dfr(files, DNAdupstability)
  • Related