I have some data, which looks like this:
df <-
data.frame(
'col' = c(
'some words [remove this] more words',
'some other words [I want this gone] this is fine',
'[nope. get rid of it] but keep this',
'all of this is fine',
'[but] this [should] go [away]')
)
col
1 some words [remove this] more words
2 some other words [I want this gone] this is fine
3 [nope get rid of it] but keep this
4 all of this is fine
5 [but] this [should] go [away]
I want to remove all of the square brackets and everything in between them.
goal_df <- df <-
data.frame(
'col' = c(
'some words more words',
'some other words this is fine',
'but keep this',
'all of this is fine',
'this go')
)
col
1 some words more words
2 some other words this is fine
3 but keep this
4 all of this is fine
5 this go
I thought that using regex (which is my worst skill in programming) would be the solution, but I can't seem to get that to work. I'm using df$col <- gsub( "[.*?]", "", df$col)
but that doesn't make any changes.
CodePudding user response:
We may match the [
, followed by one or more characters that are not ]
followed by ]
and any space as pattern and replace with blank (""
) in gsub
. The []
are metacharacters so escape (\\
)
df$col <- trimws(gsub("\\[[^]] \\]\\s?", "", df$col))
-output
> df
col
1 some words more words
2 some other words this is fine
3 but keep this
4 all of this is fine
5 this go
CodePudding user response:
A slightly easier-to-parse solution is with quantifier *
made non-greeedy by ?
:
gsub("\\s?\\[.*?\\] ", "", df$col)
[1] "some words more words" "some other words this is fine" " but keep this"
[4] "all of this is fine" " this go"
To remove leading or trailing white space use trimws