I have the following code:
for (fileName in fileNames) {
index <- "0"
if (grepl("_01", fileName, fixed = TRUE)) {
index <- "01"
}
if (grepl("_02", fileName, fixed = TRUE)) {
index <- "02"
}
}
and so on.
My filename is like "31231_sad_01.csv" or "31231_happy_01.csv".
All of my filenames are stored in a character vector fileNames. I loop through each file.
How can I find the past ending part of the filename aka 01 in this case or 02?
I tried using the code I mentioned and it always returns 1 for every value.
CodePudding user response:
Try the following:
#suppose you have your file names in a character vector
fnames <- c("31231_sad_01.csv", "31231_happy_02.csv") unlist(lapply(str_extract_all(fnames,"\\d "),'[',2))
It would return a vector
[1] "01" "02"
CodePudding user response:
An alternative way is to use sub
to extract parts of the strings. Your examples show that the targeted index in each file name is always located after _
and before .csv
. We can use this pattern in sub
:
library(magrittr)
findex <- function(filename){
filename %>%
sub(".csv.*" , "", .) %>% #extract the part before ".csv"
sub(".*_" , "", .) # exctract the part after "_"
}
This method can be used for various length of the index.
Test:
findex("31231_sad_01.csv")
#[1] "01"
findex("31231_happy_02.csv")
#[1] "02"
findex("31231_happy_213.csv")
#[1] "213"
findex("31231_happy_15213.csv")
#[1] "15213"
Then, you can use lapply
or vapply
to the vector that contains all the names:
names <- c("31231_happy_1032.csv", "31231_happy_02.csv", "31231_happy_213.csv", "31231_happy_15213.csv")
lapply(names, findex)
#[[1]]
#[1] "1032"
#[[2]]
#[1] "02"
#[[3]]
#[1] "213"
#[[4]]
#[1] "15213"
vapply(names, findex, character(1))
#31231_happy_1032.csv 31231_happy_02.csv 31231_happy_213.csv
"1032" "02" "213"
#31231_happy_15213.csv
"15213"
In case you want to use only base R, this should work:
findex1 <- function(filename) sub(".*_" , "", sub(".csv.*" , "", filename))
vapply(names, findex1, character(1))
# 31231_happy_1032.csv 31231_happy_02.csv 31231_happy_213.csv
# "1032" "02" "213"
#31231_happy_15213.csv
# "15213"
CodePudding user response:
Vectorized alternatives exist, there is no need for a loop.
To check if the last numeric part of filename ends with a specific number, here 01, we can first extract the numeric part, then run endsWith
.
string <- c("31231_sad_01.csv", "bla_215.csv", "test_05.csv")
endsWith(stringr::str_extract(string, "([^_])*(?=.csv)"), "01")
#> [1] TRUE FALSE FALSE