I have multiple CSV files that are stored in a specific order, and I want to read them in this exact same order, from the bottom to the top. They are stored like this:
tFile20.RAW
tFile17.RAW
tFile16.RAW
tFile12.RAW
tFile11.RAW
tFile10.RAW
.
.
and so on until tFile1.RAW
. I've seen multiple questions about this issue but all of them were regarding python.
I'm using this code, but it is reading the files in a random order (the CSVs are stored in smallfolder
):
temp = list.files(path = '/bigfolder/myname/smallfolder', pattern="RAW", full.names = TRUE)
final_list = lapply(temp, read.csv)
it's reading tFile1.RAW
and then jumps to tFile10.RAW
and tFile11.RAW
and so on..
How can I make it read starting from tFile1.RAW
and go to the top? so the first CSV file it reads would be tFile1.RAW
, hence final_list[[1]] = tFile1.RAW
, and then final_list[[2]] = tFile2.RAW
, final_list[[3]] = tFile3.RAW
and so on.
CodePudding user response:
library(stringr)
Preparing the folder structure and writing files
# Creating folder
folder_path <- "bigfolder/myname/smallfolder"
dir.create(folder_path, recursive = TRUE)
# Files
files <- c("file1.csv", "file10.csv", "file11.csv", "file12.csv", "file13.csv",
"file14.csv", "file15.csv", "file16.csv", "file17.csv", "file18.csv",
"file19.csv", "file2.csv", "file20.csv", "file3.csv", "file4.csv",
"file5.csv", "file6.csv", "file7.csv", "file8.csv", "file9.csv"
)
# writing files
lapply(files, \(x) write.csv(x, file.path(folder_path, x)))
With that I have a folder structure as you described in your code, now I will
list all files I’m going to read. The only difference here is that I will use
full.names = FALSE
because I think that in you local machine the path has numbers in it
temp <- list.files(folder_path)
You have to sort the files afther you use the list.file
function, I would do it as follow:
- Extract the integer in the name of the file
file_number <- stringr::str_extract(temp, "[0-9] ") |> as.numeric()
- Get the position where each file should be, comparing the ordered file_number with the position they actually have
correct_index_order <- sapply(sort(file_number), \(x) which(file_number == x))
- Rearrange you
temp
vector with that new vector
temp <- temp[correct_index_order]
temp
#> [1] "file1.csv" "file2.csv" "file3.csv" "file4.csv" "file5.csv"
#> [6] "file6.csv" "file7.csv" "file8.csv" "file9.csv" "file10.csv"
#> [11] "file11.csv" "file12.csv" "file13.csv" "file14.csv" "file15.csv"
#> [16] "file16.csv" "file17.csv" "file18.csv" "file19.csv" "file20.csv"
Now we can read the files
lapply(file.path(folder_path, temp), read.csv)
#> [[1]]
#> X x
#> 1 1 file1.csv
#>
#> [[2]]
#> X x
#> 1 1 file2.csv
#>
#> [[3]]
#> X x
#> 1 1 file3.csv
#>
#> [[4]]
#> X x
#> 1 1 file4.csv
#>
#> [[5]]
#> X x
#> 1 1 file5.csv
#>
#> [[6]]
#> X x
#> 1 1 file6.csv
#>
Created on 2023-01-14 with reprex v2.0.2