I have a dataset that is represented with the example set below:
FirstName Letter
Alexsmith A1
ThegreatAlex A6
AlexBobJones1 A7
Bobsmiles222 A1
Christopher A9
Christofer A6
I want to be able to search this dataset and remove all rows that contain, for example "Alex" (or "alex", "aLex" etc.) anywhere in the FirstName value. I have tried using grep("Alex") but have stumbled on combining my dplyr with base r, and grep seems to want a vector not a data table.
Thanks! Happy to clarify any questions
CodePudding user response:
dat <- structure(list(FirstName = c("Alexsmith", "ThegreatAlex", "AlexBobJones1",
"Bobsmiles222", "Christopher", "Christofer"), Letter = c("A1",
"A6", "A7", "A1", "A9", "A6")), class = "data.frame", row.names = c(NA, -6L))
# FirstName Letter
#1 Alexsmith A1
#2 ThegreatAlex A6
#3 AlexBobJones1 A7
#4 Bobsmiles222 A1
#5 Christopher A9
#6 Christofer A6
Here is one way:
dat[-grep("[Aa][Ll][Ee][Xx]", dat$FirstName), ]
# FirstName Letter
#4 Bobsmiles222 A1
#5 Christopher A9
#6 Christofer A6
Thanks Ritchie Sacramento for the hint that grep
accepts an argument ignore.case
. Weird that I did not even notice this argument before. So we can do
dat[grep("alex", dat$FirstName, ignore.case = TRUE, invert = TRUE), ]
With invert = TRUE
, we don't need -
before grep
for negative indexing. This is safer, in case of no match.
CodePudding user response:
If you're not comfortable using grep you can also use str_detect to get the same outcome:
dat[-which(str_detect(dat[,"FirstName"], fixed("alex", ignore_case=T))),]
The first answer gives the same result and it's a personal preference but I find this syntax more readable.
FirstName Letter
4 Bobsmiles222 A1
5 Christopher A9
6 Christofer A6