I am attempting to remove rows with certain characters in the data. In this case, I am trying to remove * and - (but only where there are multiple dashes next to each other [i.e., row 6]). The solution I am looking for either removes rows 4 & 6 entirely, or changes them to NA. I have tried grepl
, gsub
, and replace
, but something isn't working correctly.
Here is the example dataframe.
df <-structure(list(text = c("1", "3", "5", "HR*", "12-2", "--")), class = "data.frame", row.names = c(NA,
-6L))
Here is the desired outcome.
df <-structure(list(text = c("1", "3", "5", "12-2")), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
If you've used grepl
and had no luck, it's probably due to escaping (*
is a special character in regex) or drop
. Does this work?
df <- df[!grepl("\\*|--", df$text), , drop=FALSE]
> df
text
1 1
2 3
3 5
5 12-2
CodePudding user response:
We may use str_detect
library(dplyr)
library(stringr)
df %>%
filter(str_detect(text, '\\d '))
Or if it is specific to characters *
and --
df %>%
filter(str_detect(text, "--|[*]", negate = TRUE))