I have developed a particular R
function named DNAdupstability
for some Biological analysis which requires input using as fasta file (.fasta/.txt) which returns a dataframe in this format:
Sequence Position8 Position9 Position10 Position11 Position12 Position13
1 1 -1.473571 -1.473571 -1.462143 -1.412143 -1.412143 -1.371429
Position14 Position15 Position16 Position17 Position18 Position19 Position20
1 -1.372143 -1.4 -1.428571 -1.439286 -1.430714 -1.420714 -1.397143
This is a random dataframe and it continues to n positions on the basis of the input sequence. I have a folder named Random_fasta which has 1333 equal length but different fasta sequences. The developed function DNAdupstability
gives the desired outcome for a single fasta sequence (the above mentioned dataframe) from the folder Random_fasta, but now I want to carry out analysis of all the other 1332 sequences using the same DNAdupstability
function and a form a combined dataframe similar to this format for all the sequences
Sequence Position8 Position9 Position10 Position11 Position12 Position13
1 1 -1.434286 -1.434286 -1.446429 -1.435714 -1.445714 -1.509286
2 2 -1.522143 -1.492143 -1.463571 -1.435714 -1.492857 -1.544286
3 3 -1.232857 -1.265000 -1.333571 -1.328571 -1.330000 -1.329286
4 4 -1.799286 -1.799286 -1.799286 -1.799286 -1.730714 -1.735714
5 5 -1.547143 -1.507143 -1.535714 -1.530714 -1.478571 -1.450714
Position14 Position15 Position16 Position17 Position18 Position19 Position20
1 -1.452143 -1.402143 -1.390000 -1.457143 -1.509286 -1.498571 -1.458571
2 -1.544286 -1.544286 -1.544286 -1.544286 -1.601429 -1.715000 -1.755000
3 -1.340000 -1.328571 -1.333571 -1.344286 -1.384286 -1.446429 -1.486429
4 -1.667143 -1.605000 -1.536429 -1.486429 -1.536429 -1.605000 -1.600000
5 -1.450714 -1.450714 -1.412143 -1.372143 -1.434286 -1.531429 -1.615000
So that I could calculate the position-wise mean which will then be further used for some visualization using ggplot2
. Is there any way that I could apply the same functions in all the files of the folder particularly using R
and get the desired combined dataframe? Any help will be greatly appreciated!
CodePudding user response:
One option is to recursively return all the files from the main folder with list.files
, then apply the custom fuction by looping over the files, and convert to a single data.frame with do.call(rbind
files <- list.files('path/to/your/folder', recursive = TRUE,
pattern = "\\.txt$", full.names = TRUE)
lst1 <- lapply(files, DNAdupstability)
out <- do.call(rbind, lst1)
Or we can use map
from purrr
with _dfr
to combine all the output from the list
to a single data.frame
library(purrr)
out <- map_dfr(files, DNAdupstability)