Home > database >  Read CSV files in the same order as saved in the path, in R
Read CSV files in the same order as saved in the path, in R

Time:01-14

I have multiple CSV files that are stored in a specific order, and I want to read them in this exact same order, from the bottom to the top. They are stored like this:

tFile20.RAW
tFile17.RAW
tFile16.RAW
tFile12.RAW
tFile11.RAW
tFile10.RAW
.
.

and so on until tFile1.RAW. I've seen multiple questions about this issue but all of them were regarding python.

I'm using this code, but it is reading the files in a random order (the CSVs are stored in smallfolder):

temp = list.files(path = '/bigfolder/myname/smallfolder', pattern="RAW", full.names = TRUE)
final_list = lapply(temp, read.csv)

it's reading tFile1.RAW and then jumps to tFile10.RAW and tFile11.RAW and so on..

How can I make it read starting from tFile1.RAW and go to the top? so the first CSV file it reads would be tFile1.RAW, hence final_list[[1]] = tFile1.RAW, and then final_list[[2]] = tFile2.RAW, final_list[[3]] = tFile3.RAW and so on.

CodePudding user response:

library(stringr)

Preparing the folder structure and writing files

# Creating folder 
folder_path <- "bigfolder/myname/smallfolder"
dir.create(folder_path, recursive = TRUE)

# Files
files <- c("file1.csv", "file10.csv", "file11.csv", "file12.csv", "file13.csv", 
  "file14.csv", "file15.csv", "file16.csv", "file17.csv", "file18.csv", 
  "file19.csv", "file2.csv", "file20.csv", "file3.csv", "file4.csv", 
  "file5.csv", "file6.csv", "file7.csv", "file8.csv", "file9.csv"
)

# writing files
lapply(files, \(x) write.csv(x, file.path(folder_path, x)))

With that I have a folder structure as you described in your code, now I will list all files I’m going to read. The only difference here is that I will use full.names = FALSE because I think that in you local machine the path has numbers in it

temp <- list.files(folder_path)

You have to sort the files afther you use the list.file function, I would do it as follow:

  1. Extract the integer in the name of the file
file_number <- stringr::str_extract(temp, "[0-9] ") |> as.numeric()
  1. Get the position where each file should be, comparing the ordered file_number with the position they actually have
correct_index_order <- sapply(sort(file_number), \(x) which(file_number == x))
  1. Rearrange you temp vector with that new vector
temp <- temp[correct_index_order]

temp
#>  [1] "file1.csv"  "file2.csv"  "file3.csv"  "file4.csv"  "file5.csv" 
#>  [6] "file6.csv"  "file7.csv"  "file8.csv"  "file9.csv"  "file10.csv"
#> [11] "file11.csv" "file12.csv" "file13.csv" "file14.csv" "file15.csv"
#> [16] "file16.csv" "file17.csv" "file18.csv" "file19.csv" "file20.csv"

Now we can read the files

lapply(file.path(folder_path, temp), read.csv)
#> [[1]]
#>   X         x
#> 1 1 file1.csv
#> 
#> [[2]]
#>   X         x
#> 1 1 file2.csv
#> 
#> [[3]]
#>   X         x
#> 1 1 file3.csv
#> 
#> [[4]]
#>   X         x
#> 1 1 file4.csv
#> 
#> [[5]]
#>   X         x
#> 1 1 file5.csv
#> 
#> [[6]]
#>   X         x
#> 1 1 file6.csv
#> 


Created on 2023-01-14 with reprex v2.0.2

  • Related