Home > database >  How to remove certain characters from a dataframe in R?
How to remove certain characters from a dataframe in R?

Time:08-06

I am attempting to remove rows with certain characters in the data. In this case, I am trying to remove * and - (but only where there are multiple dashes next to each other [i.e., row 6]). The solution I am looking for either removes rows 4 & 6 entirely, or changes them to NA. I have tried grepl, gsub, and replace, but something isn't working correctly.

Here is the example dataframe.

df <-structure(list(text = c("1", "3", "5", "HR*", "12-2", "--")), class = "data.frame", row.names = c(NA, 
-6L))

Here is the desired outcome.

df <-structure(list(text = c("1", "3", "5", "12-2")), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response:

If you've used grepl and had no luck, it's probably due to escaping (* is a special character in regex) or drop. Does this work?

df <- df[!grepl("\\*|--", df$text), , drop=FALSE]

> df
  text
1    1
2    3
3    5
5 12-2

CodePudding user response:

We may use str_detect

library(dplyr)
library(stringr)
df %>% 
   filter(str_detect(text, '\\d '))

Or if it is specific to characters * and --

df %>% 
  filter(str_detect(text, "--|[*]", negate = TRUE))
  • Related