Home > Software design >  Remove data frame rows if keys contain a case-insensitve substring
Remove data frame rows if keys contain a case-insensitve substring

Time:07-14

I have a dataset like the following:

FirstName Letter   
Alexsmith     A1
ThegreatAlex      A6
AlexBobJones1      A7
Bobsmiles222       A1
Christopher     A9
Christofer     A6

I want to remove all rows that contain, for example "Alex" (or "alex", "aLex" etc.), anywhere in the FirstName value. I have tried using grep("Alex") but have stumbled on combining my dplyr with base R, and grep seems to want a vector not a data table.

Thanks! Happy to clarify any questions.

CodePudding user response:

dat <- structure(list(FirstName = c("Alexsmith", "ThegreatAlex", "AlexBobJones1", 
"Bobsmiles222", "Christopher", "Christofer"), Letter = c("A1", 
"A6", "A7", "A1", "A9", "A6")), class = "data.frame", row.names = c(NA, -6L))
#      FirstName Letter
#1     Alexsmith     A1
#2  ThegreatAlex     A6
#3 AlexBobJones1     A7
#4  Bobsmiles222     A1
#5   Christopher     A9
#6    Christofer     A6

Here is one way:

dat[-grep("[Aa][Ll][Ee][Xx]", dat$FirstName), ]
#     FirstName Letter
#4 Bobsmiles222     A1
#5  Christopher     A9
#6   Christofer     A6

Thanks Ritchie Sacramento for the hint that grep accepts an argument ignore.case. Weird that I did not even notice this argument before. So we can do

dat[grep("alex", dat$FirstName, ignore.case = TRUE, invert = TRUE), ]

With invert = TRUE, we don't need - before grep for negative indexing. This is safer, in case of no match.

CodePudding user response:

If you're not comfortable using grep you can also use str_detect to get the same outcome:

dat[-which(str_detect(dat[,"FirstName"], fixed("alex", ignore_case=T))),]

The first answer gives the same result and it's a personal preference but I find this syntax more readable.

    FirstName Letter
4 Bobsmiles222     A1
5  Christopher     A9
6   Christofer     A6
  • Related