Home > Net >  Remove trailing brackets in a string
Remove trailing brackets in a string

Time:05-25

Please would appreciate some help with removing/replacing trailing square brackets, inner quotes and slashes in a character data in R, preferably using dplyr.

Sample:

df <- c("['Mamie Smith']", "[\"Screamin' Jay Hawkins\"]")

What I have tried:

gsub("[[]]", "", df) # Throws error
df %>%
  str_replace("[[]]", "") # Also throws error

What data should look like.

"Mamie Smith", "Screamin' Jay Hawkins"

Would love your assistance.

CodePudding user response:

In base R we can make use of trimws function:

if we are not interested in the non word parts:

trimws(df, whitespace = "\\W ")
[1] "Mamie Smith"           "Screamin' Jay Hawkins"

But if we are only interested in deleting squarebrackets and quotes while leaving other punctuatons, spaces etc then:

trimws(df, whitespace = "[\\]\\[\"'] ")
[1] "Mamie Smith"           "Screamin' Jay Hawkins"

CodePudding user response:

Base R:

sapply(regmatches(df, regexec('(\\w.*)(.*\\w)', df)), "[", 1)

[1] "Mamie Smith"           "Screamin' Jay Hawkins"

OR

We could use str_extract from stringr package with this regex:

library(stringr)

str_extract(df, '(\\w.*)(.*\\w)')

[1] "Mamie Smith"           "Screamin' Jay Hawkins"

CodePudding user response:

To pair up the square brackets with the accompanying type of quote, you can use:

\[(["'])(.*?)\1]

Explanation

  • \[ Match [
  • (["']) Capture group 1, capture either " or '
  • (.*?) Capture group 2, match as least as possible characters
  • \1 Backreference to group 1 to match the same type of quote
  • ] Match ]

In the replacement use the value of capture group 2 using \\2

Regex demo | R demo

df <- c("['Mamie Smith']", "[\"Screamin' Jay Hawkins\"]")
gsub("\\[([\"'])(.*?)\\1]", "\\2", df)

Output

[1] "Mamie Smith"           "Screamin' Jay Hawkins"

CodePudding user response:

since [, ] and " are special characters you need to 'escape' with a double backslash \\

here's some alt code:

gsub('\\"|\\[|\\]', "", df)

CodePudding user response:

Another, relatively easy, regex solution is this:

data.frame(df) %>%
  mutate(df = gsub("\\[\\W |\\W \\]", "", df))
                     df
1           Mamie Smith
2 Screamin' Jay Hawkins

Here we remove any non-alphanumeric character (\\W ) occurring one or more times on the condition that it be preceded OR (|) followed by a square bracket.

Alternatively, to borrow from @TaerJae but greatly simplified:

library(stringr)
data.frame(df) %>%
  mutate(df = str_extract(df, '\\w.*\\w'))

Here we simply focus on the alphanumeric characters (\\w) on either side of the string, while allowing for any characters (.*) to occur in-between them thus capturing, for example, the apostrophe in Screamin'and the whitespaces.

CodePudding user response:

When looking for ] inside [] it need to be on first place []] or esacpe it on other places. Quotes which are used for the string need to be escaped when used inside "[\"]" or '["]'. In the example string are no slashes (here they are only escaping ").

gsub("[]['\"]", "", df)
#[1] "Mamie Smith"          "Screamin Jay Hawkins"

Another option, avoiding escaping " or ' is to use raw character constants r"(...)".

gsub(r"([]["'])", "", df)
#[1] "Mamie Smith"          "Screamin Jay Hawkins"

To limit the search to the borders ^ (begin) and $ (end) need to be given.

gsub("^[]['\"]*|[]['\"]*$", "", df)
#[1] "Mamie Smith"           "Screamin' Jay Hawkins"

or trimws could be used.

trimws(df, "both", "[]['\"]")
#[1] "Mamie Smith"           "Screamin' Jay Hawkins"
  • Related