I want to search a phonetic dictionary (tsv with two columns, one for words, another for phonetic transcription: IPA) for certain consonant clusters according to the type combination (e.g. fricative plosive, plosive fricative, plosive liquid, etc.). I created a vector concatenating the corresponding phonemes:
plosives <- c("p", "b", "t", "d", "k", "g")
fricatives <- c("f", "v", "s", "z", "ʂ", "ʐ", "x")
The point of writing these vectors in the first place I to shorthand and quickly reference each consonant type when writing different regexes. I want to search all two-consonant combinations from these two types (FP, PF, PP, FF). How can I write a regex in R using these vectors as pattern parameters?
I know crossing (fricatives, plosives)
gives me all combinations as a string, but I get an error when using it in: CC.all <- str_extract_all(ruphondict$IPA, crossing (fricatives, plosives), simplify = T)
CodePudding user response:
A base R way to form a regex.
paste(
apply(expand.grid(plosives, fricatives), 1, paste0, collapse = ""),
collapse = "|"
)
Note that this is in fact a one-liner.
paste(apply(expand.grid(plosives, fricatives), 1, paste0, collapse = ""),collapse = "|")
CodePudding user response:
You need to make a |
-delimited string to use as a regular expression:
plosives <- c("p", "b", "t", "d", "k", "g")
fricatives <- c("f", "v", "s", "z", "ʂ", "ʐ", "x")
my_regex <- (crossing(plosives, fricatives)
|> mutate(comb = paste0(plosives, fricatives))
|> pull(comb)
|> paste(collapse = "|")
)
[1] "bf|bs|bʂ|bv|bx|bz|bʐ|df|ds|dʂ|dv|dx|dz|dʐ|gf|gs|gʂ|gv|gx|gz|gʐ|kf|ks|kʂ|kv|kx|kz|kʐ|pf|ps|pʂ|pv|px|pz|pʐ|tf|ts|tʂ|tv|tx|tz|tʐ"