I have 2 folders (folder A and folder B) each having about 900 .csv files. Now I want to open 1 csv file from folder A and 1 csv file from folder B and want to do some calculations with them. Then the result (just a numerical statistical value) should be saved in a seperate list.
After that the two imported cvs files should be removed and I take the next csv files: the next from folder A and the next from folder B.
The paring is like: 1_1_Alpha.csv from folder A with 1_1_Beta.csv from folder B -> 3_1_Alpha.csv from folder A with 3_1_Beta.csv from folder B and so on...
Does anyone know if this is possible? Is there a package to iterate through 2 files simultaneously? How do I program this? I would be glad for any help!
CodePudding user response:
I think mapply
is useful here. The intent is to iterate over each of the "A" files with the corresponding "B" files; order and set-membership (file-existence) is critical, otherwise the summary statistic may be silently misleading.
Afiles <- sort(list.files("A", pattern = "csv$", full.names = TRUE))
Bfiles <- sort(list.files("B", pattern = "csv$", full.names = TRUE))
## double check file match between the two
Abase <- gsub("Alpha", "", basename(Afiles))
Bbase <- gsub("Beta", "", basename(Bfiles))
AnotB <- !Abase %in% Bbase
if (length(AnotB)) {
warning("files in 'A' not in 'B': ", paste(sQuote(Afiles[AnotB]), FALSE), collapse = ", "))
Afiles <- Afiles[!AnotB]
}
BnotA <- !Bbase %in% Abase
if (length(BnotA)) {
warning("files in 'B' not in 'A': ", paste(sQuote(Bfiles[BnotA]), FALSE), collapse = ", "))
Bfiles <- Bfiles[!BnotA]
}
## ensure the same order
Afiles <- Afiles[order(Abase)]
Bfiles <- Bfiles[order(Bbase)]
## one final check ... they need to match
stopifnot(all(gsub("Alpha", "", basename(Afiles)) == gsub("Beta", "", basename(Bfiles))))
ABstats <- mapply(function(ax, bx) {
# some statistic
return(nrow(ax) - nrow(bx))
}, lapply(Afiles, read.csv), lapply(Bfiles, read.csv))
Another option (for same-file-pairing) would be something like:
Afiles <- list.files("A", pattern = "csv$", full.names = TRUE)
Bfiles <- gsub("^A/", "B/", gsub("Alpha.csv", "Beta.csv", Afiles))
keep <- file.exists(Bfiles)
Afiles <- Afiles[keep]
Bfiles <- Bfiles[keep]
Though this does not "alarm" when B files exist without matching A.