Home > front end >  Can I use a vector as a regex pattern parameter in R?
Can I use a vector as a regex pattern parameter in R?

Time:06-23

I want to search a phonetic dictionary (tsv with two columns, one for words, another for phonetic transcription: IPA) for certain consonant clusters according to the type combination (e.g. fricative plosive, plosive fricative, plosive liquid, etc.). I created a vector concatenating the corresponding phonemes:

plosives <- c("p", "b", "t", "d", "k", "g")  
fricatives <- c("f", "v", "s", "z", "ʂ", "ʐ", "x")

The point of writing these vectors in the first place I to shorthand and quickly reference each consonant type when writing different regexes. I want to search all two-consonant combinations from these two types (FP, PF, PP, FF). How can I write a regex in R using these vectors as pattern parameters?

I know crossing (fricatives, plosives) gives me all combinations as a string, but I get an error when using it in: CC.all <- str_extract_all(ruphondict$IPA, crossing (fricatives, plosives), simplify = T)

CodePudding user response:

A base R way to form a regex.

paste(
  apply(expand.grid(plosives, fricatives), 1, paste0, collapse = ""),
  collapse = "|"
)

Note that this is in fact a one-liner.

paste(apply(expand.grid(plosives, fricatives), 1, paste0, collapse = ""),collapse = "|")

CodePudding user response:

You need to make a |-delimited string to use as a regular expression:

plosives <- c("p", "b", "t", "d", "k", "g")  
fricatives <- c("f", "v", "s", "z", "ʂ", "ʐ", "x")

my_regex <- (crossing(plosives, fricatives) 
    |> mutate(comb = paste0(plosives, fricatives)) 
    |> pull(comb) 
    |> paste(collapse = "|")
)
[1] "bf|bs|bʂ|bv|bx|bz|bʐ|df|ds|dʂ|dv|dx|dz|dʐ|gf|gs|gʂ|gv|gx|gz|gʐ|kf|ks|kʂ|kv|kx|kz|kʐ|pf|ps|pʂ|pv|px|pz|pʐ|tf|ts|tʂ|tv|tx|tz|tʐ"
  • Related