Home > front end >  grep words and create a new column for used words in R
grep words and create a new column for used words in R

Time:10-19

I would like to create a new column for words used for grep. I have a data frame and a list of keywords to identify whether my data frame includes such list of keywords or not. If keywords are included in the data frame, I would like to know which words in a newly created column.

So, this is what my data is

id // skills
1 // this is a skill for xoobst
2 // artificial intelligence
3 // logistic regression

I used the below code to grep words.

keyword <- "xoobst|logistic|intelligence"
result <- df[grep(keyword, df$skills, ignore.case = T),]

This is what I desired for as an outcome

id // skills // words
1 // this is a skill for xoobst // xoobst
2 // artificial intelligence // intelligence
3 // logistic regression // logistic

I tried the below code, but it got me a full sentence rather than a word used to identify whether it includes the word or not.

keys <- sprintf(".*(%s).*", keyword)
df$words <- sub(keys, "\\1", df$skills)

Which alternative way would be necessary for me? Thank you in advance!

CodePudding user response:

You can use stringr:

df <- data.frame(
  id = c(1, 2, 3), 
  skills = c("this is a skill for xoobst", "artificial intelligence", "logistic regression")
)

df |>
  dplyr::mutate(words = stringr::str_extract(df$skills, "xoobst|logistic|intelligence"))
#>   id                     skills        words
#> 1  1 this is a skill for xoobst       xoobst
#> 2  2    artificial intelligence intelligence
#> 3  3        logistic regression     logistic

CodePudding user response:

Using R base functions:

> df$words <- gsub(".*(xoobst|logistic|intelligence).*", "\\1", df$skills)
> df
  id                     skills        words
1  1 this is a skill for xoobst       xoobst
2  2    artificial intelligence intelligence
3  3        logistic regression     logistic

CodePudding user response:

Using grep with sapply and strsplit.

df$words <- sapply(strsplit(df$skills, " "), function(x) grep(keyword, x, value=T))
df
  id                     skills        words
1  1 this is a skill for xoobst       xoobst
2  2    artificial intelligence intelligence
3  3        logistic regression     logistic

This assumes that single keywords don't contain spaces.

  • Related