I would like to create a new column for words used for grep. I have a data frame and a list of keywords to identify whether my data frame includes such list of keywords or not. If keywords are included in the data frame, I would like to know which words in a newly created column.
So, this is what my data is
id // skills
1 // this is a skill for xoobst
2 // artificial intelligence
3 // logistic regression
I used the below code to grep words.
keyword <- "xoobst|logistic|intelligence"
result <- df[grep(keyword, df$skills, ignore.case = T),]
This is what I desired for as an outcome
id // skills // words
1 // this is a skill for xoobst // xoobst
2 // artificial intelligence // intelligence
3 // logistic regression // logistic
I tried the below code, but it got me a full sentence rather than a word used to identify whether it includes the word or not.
keys <- sprintf(".*(%s).*", keyword)
df$words <- sub(keys, "\\1", df$skills)
Which alternative way would be necessary for me? Thank you in advance!
CodePudding user response:
You can use stringr
:
df <- data.frame(
id = c(1, 2, 3),
skills = c("this is a skill for xoobst", "artificial intelligence", "logistic regression")
)
df |>
dplyr::mutate(words = stringr::str_extract(df$skills, "xoobst|logistic|intelligence"))
#> id skills words
#> 1 1 this is a skill for xoobst xoobst
#> 2 2 artificial intelligence intelligence
#> 3 3 logistic regression logistic
CodePudding user response:
Using R base functions:
> df$words <- gsub(".*(xoobst|logistic|intelligence).*", "\\1", df$skills)
> df
id skills words
1 1 this is a skill for xoobst xoobst
2 2 artificial intelligence intelligence
3 3 logistic regression logistic
CodePudding user response:
Using grep
with sapply
and strsplit
.
df$words <- sapply(strsplit(df$skills, " "), function(x) grep(keyword, x, value=T))
df
id skills words
1 1 this is a skill for xoobst xoobst
2 2 artificial intelligence intelligence
3 3 logistic regression logistic
This assumes that single keywords don't contain spaces.