Home > database >  Subsetting a string vector based on a partial match of unknown characters
Subsetting a string vector based on a partial match of unknown characters

Time:12-15

I have a vector of 8-character file names of the format

"/relative/path/to/folder/a(bc|de|fg)...[xy]1.sav"

where the brackets hold one of two-three known characters, and the '...' are three unknown characters. I want to match all character vectors that has the same unknown sequence XXX and sort into a list of character vectors.

I am not sure how to proceed on this. I am thinking about a way to extract the letters in the fourth to sixth position (...), and put into a vector then use `grep to get all the files with the matching string.

E.g.

# Pseudo-code. Not functioning code, but sort of the thing I want to do

> char.extr <- str_extract(file.vector, !"a(bc|de|fg)...[xy]1.sav")
> char.extr

"JKL", "MNO" ,"PQR" ...

# Use grep and lapply to put matched strings into list

> path.list <- lapply(char.extr, grep, file.vector)

> path.list

  1. "/relative/path/to/folder/abcJKLx1.sav"
     "/relative/path/to/folder/adeJKLy1.sav"
  
  2. "/relative/path/to/folder/afgMNOx1.sav"
     "/relative/path/to/folder/abcMNOy1.sav"

CodePudding user response:

Since we know the name structure, I'd imaging extracting the 3 letter substring and then using split to get individual lists is what you're looking for.

split(path.list, substr(basename(path.list), 4, 6))
  • Related