Home > Net >  How to check if filename ends with a certain string? (R)
How to check if filename ends with a certain string? (R)

Time:03-15

I have the following code:

for (fileName in fileNames) {
    index <- "0"
      
      if (grepl("_01", fileName, fixed = TRUE)) {
        index <- "01"
      }
      
      if (grepl("_02", fileName, fixed = TRUE)) {
        index <- "02"
      }
}

and so on.

My filename is like "31231_sad_01.csv" or "31231_happy_01.csv".

All of my filenames are stored in a character vector fileNames. I loop through each file.

How can I find the past ending part of the filename aka 01 in this case or 02?

I tried using the code I mentioned and it always returns 1 for every value.

CodePudding user response:

Try the following:

#suppose you have your file names in a character vector

fnames <- c("31231_sad_01.csv", "31231_happy_02.csv") unlist(lapply(str_extract_all(fnames,"\\d "),'[',2))

It would return a vector

[1] "01" "02"

CodePudding user response:

An alternative way is to use sub to extract parts of the strings. Your examples show that the targeted index in each file name is always located after _ and before .csv. We can use this pattern in sub:

library(magrittr)

findex <- function(filename){ 
           filename %>% 
           sub(".csv.*" , "", .) %>%  #extract the part before ".csv"
           sub(".*_" , "", .)         # exctract the part after "_"
          }

This method can be used for various length of the index.

Test:

findex("31231_sad_01.csv")
#[1] "01"
findex("31231_happy_02.csv")
#[1] "02"
findex("31231_happy_213.csv")
#[1] "213"
findex("31231_happy_15213.csv")
#[1] "15213"

Then, you can use lapply or vapply to the vector that contains all the names:

names <- c("31231_happy_1032.csv", "31231_happy_02.csv", "31231_happy_213.csv", "31231_happy_15213.csv")
lapply(names, findex)
#[[1]]
#[1] "1032"

#[[2]]
#[1] "02"

#[[3]]
#[1] "213"

#[[4]]
#[1] "15213"

vapply(names, findex, character(1))
#31231_happy_1032.csv    31231_happy_02.csv   31231_happy_213.csv 
               "1032"                  "02"                 "213" 
#31231_happy_15213.csv 
              "15213" 

In case you want to use only base R, this should work:

findex1 <- function(filename) sub(".*_" , "",  sub(".csv.*" , "", filename))

vapply(names, findex1, character(1))
# 31231_happy_1032.csv    31231_happy_02.csv   31231_happy_213.csv 
#               "1032"                  "02"                 "213" 
#31231_happy_15213.csv 
#              "15213" 

CodePudding user response:

Vectorized alternatives exist, there is no need for a loop.

To check if the last numeric part of filename ends with a specific number, here 01, we can first extract the numeric part, then run endsWith.

string <- c("31231_sad_01.csv", "bla_215.csv", "test_05.csv")
endsWith(stringr::str_extract(string, "([^_])*(?=.csv)"), "01")

#> [1]  TRUE FALSE FALSE
  •  Tags:  
  • r
  • Related