Home > other >  Dealing with square brackets in regex
Dealing with square brackets in regex

Time:12-21

I have some data, which looks like this:

df <-
  data.frame(
    'col' = c(
      'some words [remove this] more words',
      'some other words [I want this gone] this is fine',
      '[nope.  get rid of it] but keep this',
      'all of this is fine',
      '[but] this [should] go [away]')
    )

                                               col
1              some words [remove this] more words
2 some other words [I want this gone] this is fine
3              [nope  get rid of it] but keep this
4                              all of this is fine
5                    [but] this [should] go [away]

I want to remove all of the square brackets and everything in between them.

goal_df <- df <-
  data.frame(
    'col' = c(
      'some words more words',
      'some other words this is fine',
      'but keep this',
      'all of this is fine',
      'this go')
  )

                            col
1         some words more words
2 some other words this is fine
3                 but keep this
4           all of this is fine
5                       this go

I thought that using regex (which is my worst skill in programming) would be the solution, but I can't seem to get that to work. I'm using df$col <- gsub( "[.*?]", "", df$col) but that doesn't make any changes.

CodePudding user response:

We may match the [, followed by one or more characters that are not ] followed by ] and any space as pattern and replace with blank ("") in gsub. The [] are metacharacters so escape (\\)

df$col <- trimws(gsub("\\[[^]] \\]\\s?", "", df$col))

-output

> df
                            col
1         some words more words
2 some other words this is fine
3                 but keep this
4           all of this is fine
5                       this go

CodePudding user response:

A slightly easier-to-parse solution is with quantifier * made non-greeedy by ?:

gsub("\\s?\\[.*?\\] ", "", df$col)
[1] "some words more words"         "some other words this is fine" " but keep this"               
[4] "all of this is fine"           " this go"

To remove leading or trailing white space use trimws

  • Related