so I have a data frame like below. I want to extract all the words that are listed in the list "x" from the column "phone", and put them into the column "vowel".
x <- C("IH","EH","AE","AH","OH","UH","IY","EY","AY","OY","AW","OW","ER","AA","AO")
df
word phone vowel
THERE DH, EH, AH NA
MUSHROOM M, AH, SH, R, UW, M NA
YOU Y, UW NA
IT'S IH, T, S NA
The expected outcome would look like this:
df
word phone vowel
THERE DH, EH, AH EH, AH
MUSHROOM M, AH, SH, R, UW, M AH
YOU Y, UW
IT'S IH, T, S IH
I tried the following code, but it only yields "AH" in all the rows.
for (i in df$phone){
if (i %in% x){
df$vowel <- i
}
}
Could someone help me out a bit ? Thanks in advance !
CodePudding user response:
df$vowel=sapply(
df$phone,
function(a){
b=strsplit(a,", ")[[1]]
paste(b[b %in% x],collapse=", ")
}
)
word phone vowel
1 THERE DH, EH, AH EH, AH
2 MUSHROOM M, AH, SH, R, UW, M AH
3 YOU Y, UW
4 IT'S IH, T, S IH
In case you have some weird stuff in your phone variable
df=rbind(df,c('MOM','c("EH", "IH", "OW"), c("EH", "AH")...',NA))
add gsub('([c\\()..."])*([A-Z] )*',"\\2",a)
before splitting the string.
CodePudding user response:
You can try
d1$vowel <- stringr::str_extract_all(d1$phone, paste(x, collapse = '|'))
> d1
word phone vowel
1 THERE DH,EH,AH EH, AH
2 MUSHROOM M,AH,SH,R,UW,M AH
3 YOU Y,UW
4 ITS IH,T,S IH
CodePudding user response:
Hi as there is no R code to generate df in your question, here is some example. Hope it is helpful for reference.
a="DH, EH, AH"
b=c('AH', 'EH'.'OO')
sapply(b, FUN=function(b) grepl(pattern=b, x=a))
AH EH
TRUE TRUE