Home > Net >  Remove whole sentences from dataset
Remove whole sentences from dataset

Time:11-07

I have a dataset that looks like this:

output
Others. Specify (separate by comma if there is more than one):
Everyone cries/has feelings,Others. Specify (separate by comma if there is more than one):
Family upbringing
Everyone cries/has feelings,Others. Specify (separate by comma if there is more than one):
Did not say

How can I remove the sentence "Others. Specify (separate by comma if there is more than one):" from the dataset? I've tried

gsub("Others. Specify (separate by comma if there is more than one):", "", datset$output)

and str_remove_all() but it didn't work.

CodePudding user response:

You could achieve your desired result by adding fixed=TRUE, which means to match the pattern as is

gsub("Others. Specify (separate by comma if there is more than one):", 
     "", 
     datset$output, 
     fixed = TRUE)
#> [1] ""                             "Everyone cries/has feelings,"
#> [3] "Family upbringing"            "Everyone cries/has feelings,"
#> [5] "Did not say"

Second option would be to escape all special characters which in your case are the . and in particualar the (), e.g. in a regex () are used to create a capturing group. Hence to match a e.g. ( you have to use \\(:

gsub("Others\\. Specify \\(separate by comma if there is more than one\\):", "", datset$output)

DATA

datset <- data.frame(
  output = c(
    "Others. Specify (separate by comma if there is more than one):",
    "Everyone cries/has feelings,Others. Specify (separate by comma if there is more than one):", "Family upbringing",
    "Everyone cries/has feelings,Others. Specify (separate by comma if there is more than one):", "Did not say"
  )
)
  • Related