Home > other >  R tolower only within function
R tolower only within function

Time:01-27

I would like to remove words from a character vector. This is how I do:

library(tm)
words = c("the", "The", "Intelligent", "this", "This")
words_to_remove = c("the", "This")
removeWords(tolower(words), tolower(words_to_remove))

This is really nice, but I would like the word "Intelligent" to be returned as it was, meaning "Intelligent" instead of "intelligent. Is there a possibility to use the function tolower only within the function removeWords?

CodePudding user response:

You could just use a base R approach with grepl here:

words_to_remove = c("the", "This")
pattern <- paste0("\\b", words_to_remove, "\\b", collapse="|")
words = c("the", "The", "Intelligent", "this", "This")

res <- grepl(pattern, words, ignore.case=TRUE)
words[!res]

[1] "Intelligent"

Demo

The trick I use here is in my call to paste to generate the following pattern:

\bthe\b|\bThis\b

This pattern can, in a single regex evaluation, determine if any string in words is a match to be removed.

CodePudding user response:

Here is another option using base R's %in% function:

words = c("the", "The", "Intelligent", "this", "This")
words_to_remove = c("the", "This")

words[!(tolower(words) %in% tolower(words_to_remove))]

%in% returns TRUE for all cases where "words" are in the "words_to_remove" list. Take the inverse for the words to keep.

  • Related