Home > Enterprise >  How to sort a list of csvs loaded into R by certain aspects of their filenames
How to sort a list of csvs loaded into R by certain aspects of their filenames

Time:12-21

The code in this question and the datasets used in the code can be found in my enter image description here

As you can see, they are out of order. It is ordering them based on the individual digit in the last of the 4 descriptors for each dataset.

How can I reorder their names to be arranged correctly as in the attached photo?

I was also suggested to use this on here before:

# reformat the names of each of the csv file formatted dataset
DS_names_list <- basename(filepaths_list)
DS_names_list <- tools::file_path_sans_ext(DS_names_list)
> DS_names_list
 [1] "0-3-1-1"  "0-3-1-10" "0-3-1-11" "0-3-1-12" "0-3-1-13" "0-3-1-14" "0-3-1-15" "0-3-1-16"
 [9] "0-3-1-17" "0-3-1-18" "0-3-1-19" "0-3-1-2"  "0-3-1-20" "0-3-1-3"  "0-3-1-4"  "0-3-1-5" 
[17] "0-3-1-6"  "0-3-1-7"  "0-3-1-8"  "0-3-1-9"

But any alteration to this will not reorder or sort the actual file path list itself.

CodePudding user response:

Okay, I'm going to try to simplify this down so it's clear and concise. Minimal reproducible examples are much quicker and easier to answer than lengthy questions with github links and screenshots.

As far as I can tell, your problem is this: You have data like this:

## nicely copy/pasteable sample data
## demonstrates the problem
## omits unneeded details
sample_data = c(
  "C:/path/0-3-1-1.csv", 
  "C:/path/0-3-1-10.csv",
  "C:/path/0-3-1-2.csv"
)

And you want to be able to sort it by the numeric components separated by dashes, treated numerically not alphabetically, so the desired result is

desired_result = c(
  "C:/path/0-3-1-1.csv", 
  "C:/path/0-3-1-2.csv",
  "C:/path/0-3-1-10.csv"
)

Here's an approach:

# extract the file names (as you have already done)
filenames = sample_data |> basename() |> tools::file_path_sans_ext()


my_order = filenames |> 
  # split apart the numbers
  strsplit(split = "-", fixed = TRUE) |>
  unlist() |> 
  # convert them to numeric and get them in a data frame
  as.numeric() |> 
  matrix(nrow = length(filenames), byrow = TRUE) |>
  as.data.frame() |>
  # get the appropriate ordering to sort the data frame
  do.call(order, args = _)

my_order
# [1] 1 3 2

sample_data[my_order]
# [1] "C:/path/0-3-1-1.csv"  "C:/path/0-3-1-2.csv"  "C:/path/0-3-1-10.csv"

The my_order result gives the indices to rearrange the original data to the desired result. You can use it on the sample_data or on just the extracted file names.

Another solution is to use the gtools::mixedorder() or gtools::mixedsort() functions. Confusingly, when I tried them out on the sample data they gave the reverse order. Then I realized that the gtools functions interpret your - separators as negative signs. So to use that tool, we would need to replace - with a different character:

sample_data |> 
  gsub(pattern = "-", replacement = "|", fixed = TRUE) |>
  gtools::mixedorder()
# [1] 1 3 2
## same ordering result as above

CodePudding user response:

Another approach essentially the same as @Gregor's logic. Split the components out, and then call all of them as a list of inputs to the order function.

ord <- do.call(order,
         strcapture("(\\d )-(\\d )-(\\d )-(\\d )", 
                   basename(sample_data), proto=list(1L,1L,1L,1L)))
sample_data[ord]
#[1] "C:/path/0-3-1-1.csv"  "C:/path/0-3-1-2.csv"  "C:/path/0-3-1-10.csv"
  • Related