Home > OS >  Remove trailing brackets in a string
Remove trailing brackets in a string

Time:05-22

Please would appreciate some help with removing/replacing trailing square brackets, inner quotes and slashes in a character data in R, preferably using dplyr.

Sample:

df <- c("['Mamie Smith']", "[\"Screamin' Jay Hawkins\"]")

What I have tried:

gsub("[[]]", "", df) # Throws error
df %>%
  str_replace("[[]]", "") # Also throws error

What data should look like.

"Mamie Smith", "Screamin' Jay Hawkins"

Would love your assistance.

CodePudding user response:

In base R we can make use of trimws function:

if we are not interested in the non word parts:

trimws(df, whitespace = "\\W ")
[1] "Mamie Smith"           "Screamin' Jay Hawkins"

But if we are only interested in deleting squarebrackets and quotes while leaving other punctuatons, spaces etc then:

trimws(df, whitespace = "[\\]\\[\"'] ")
[1] "Mamie Smith"           "Screamin' Jay Hawkins"

CodePudding user response:

Base R:

sapply(regmatches(df, regexec('(\\w.*)(.*\\w)', df)), "[", 1)

[1] "Mamie Smith"           "Screamin' Jay Hawkins"

OR

We could use str_extract from stringr package with this regex:

library(stringr)

str_extract(df, '(\\w.*)(.*\\w)')

[1] "Mamie Smith"           "Screamin' Jay Hawkins"

CodePudding user response:

since [, ] and " are special characters you need to 'escape' with a double backslash \\

here's some alt code:

gsub('\\"|\\[|\\]', "", df)

CodePudding user response:

To pair up the square brackets with the accompanying type of quote, you can use:

\[(["'])(.*?)\1]

Explanation

  • \[ Match [
  • (["']) Capture group 1, capture either " or '
  • (.*?) Capture group 2, match as least as possible characters
  • \1` Backreference to group 1 to match the same type of quote
  • ] Match ]

Regex demo | R demo

df <- c("['Mamie Smith']", "[\"Screamin' Jay Hawkins\"]")
gsub("\\[([\"'])(.*?)\\1]", "\\2", df)

Output

[1] "Mamie Smith"           "Screamin' Jay Hawkins"
  • Related