Home > Back-end >  Filling a column by extracting words from another column
Filling a column by extracting words from another column

Time:12-16

so I have a data frame like below. I want to extract all the words that are listed in the list "x" from the column "phone", and put them into the column "vowel".

x <- C("IH","EH","AE","AH","OH","UH","IY","EY","AY","OY","AW","OW","ER","AA","AO")
df
word           phone                 vowel 
THERE          DH, EH, AH            NA
MUSHROOM       M, AH, SH, R, UW, M   NA
YOU            Y, UW                 NA
IT'S           IH, T, S              NA

The expected outcome would look like this:

df
word           phone                 vowel 
THERE          DH, EH, AH            EH, AH
MUSHROOM       M, AH, SH, R, UW, M   AH
YOU            Y, UW                 
IT'S           IH, T, S              IH

I tried the following code, but it only yields "AH" in all the rows.

 for (i in df$phone){
  if (i %in% x){
  df$vowel <- i
  }
  }

Could someone help me out a bit ? Thanks in advance !

CodePudding user response:

df$vowel=sapply(
  df$phone,
  function(a){
    b=strsplit(a,", ")[[1]]
    paste(b[b %in% x],collapse=", ")
  }
)

      word               phone  vowel
1    THERE          DH, EH, AH EH, AH
2 MUSHROOM M, AH, SH, R, UW, M     AH
3      YOU               Y, UW       
4     IT'S            IH, T, S     IH

In case you have some weird stuff in your phone variable

df=rbind(df,c('MOM','c("EH", "IH", "OW"), c("EH", "AH")...',NA))

add gsub('([c\\()..."])*([A-Z] )*',"\\2",a) before splitting the string.

CodePudding user response:

You can try

d1$vowel <- stringr::str_extract_all(d1$phone, paste(x, collapse = '|'))

> d1
      word          phone  vowel
1    THERE       DH,EH,AH EH, AH
2 MUSHROOM M,AH,SH,R,UW,M     AH
3      YOU           Y,UW       
4      ITS         IH,T,S     IH

CodePudding user response:

Hi as there is no R code to generate df in your question, here is some example. Hope it is helpful for reference.

a="DH, EH, AH"
b=c('AH', 'EH'.'OO')
sapply(b, FUN=function(b) grepl(pattern=b, x=a))

  AH   EH 
TRUE TRUE 
  • Related