Home > Back-end >  how to extract specific part of character vector in R?
how to extract specific part of character vector in R?

Time:06-03

I have file names like this as a character vector;

filenames <- c('1194-1220-1479-891--(133.07).RDS','1194-1221-1421-891--(101.51).RDS')

Don't want to have digits in pharanthesis and want to have digits "/" separated. So the desired output is;

filenames_desired <- c('1194/1220/1479/891','1194/1221/1421/891')

I tried with gsub but didn't know how to remove digits in pharanthesis.

Thanks in advance

CodePudding user response:

Using stringr, looking around (?=-) meaning: has to be followed by a dash and sapply:

filenames <- c('1194-1220-1479-891--(133.07).RDS','1194-1221-1421-891--(101.51).RDS')

sapply(stringr::str_extract_all(filenames, "\\d (?=-)"), 
       paste0, 
       collapse = "/") 

[1] "1194/1220/1479/891" "1194/1221/1421/891"

CodePudding user response:

We could use a single sub() call here:

filenames <- c("1194-1220-1479-891--(133.07).RDS",
               "1194-1221-1421-891--(101.51).RDS")

output <- sub("(\\d )-(\\d )-(\\d )-(\\d ).*", "\\1/\\2/\\3/\\4", filenames)
output

[1] "1194/1220/1479/891" "1194/1221/1421/891"

CodePudding user response:

As I can see, the first 18 characters of your names are the base of your final names; so, you can use the following code

# Initial names
  filenames <- ('1194-1220-1479-891--(133.07).RDS','1194-1221-1421-891--(101.51).RDS')

# Extract the first 18 elements of "filenames"
  nam <- substr(filenames, 1, 18)
# Replace "-" by "/"
  final.names <- str_replace_all(nam, "-", "/")

CodePudding user response:

You could use strsplit to extract the first element from each list and then use gsub:

gsub('-', '/', sapply(strsplit(filenames, '--'), `[[`, 1))

which will yield

#"1194/1220/1479/891" "1194/1221/1421/891"

CodePudding user response:

Just use gsub with \\--.*: removes everything after and including --:

gsub('\\--.*', '', filenames)

[1] "1194-1220-1479-891" "1194-1221-1421-891"
  • Related