Please would appreciate some help with removing/replacing trailing square brackets, inner quotes and slashes in a character data in R, preferably using dplyr
.
Sample:
df <- c("['Mamie Smith']", "[\"Screamin' Jay Hawkins\"]")
What I have tried:
gsub("[[]]", "", df) # Throws error
df %>%
str_replace("[[]]", "") # Also throws error
What data should look like.
"Mamie Smith", "Screamin' Jay Hawkins"
Would love your assistance.
CodePudding user response:
In base R we can make use of trimws
function:
if we are not interested in the non word parts:
trimws(df, whitespace = "\\W ")
[1] "Mamie Smith" "Screamin' Jay Hawkins"
But if we are only interested in deleting squarebrackets and quotes while leaving other punctuatons, spaces etc then:
trimws(df, whitespace = "[\\]\\[\"'] ")
[1] "Mamie Smith" "Screamin' Jay Hawkins"
CodePudding user response:
Base R:
sapply(regmatches(df, regexec('(\\w.*)(.*\\w)', df)), "[", 1)
[1] "Mamie Smith" "Screamin' Jay Hawkins"
OR
We could use str_extract
from stringr
package with this regex:
library(stringr)
str_extract(df, '(\\w.*)(.*\\w)')
[1] "Mamie Smith" "Screamin' Jay Hawkins"
CodePudding user response:
since [
, ]
and "
are special characters you need to 'escape' with a double backslash \\
here's some alt code:
gsub('\\"|\\[|\\]', "", df)
CodePudding user response:
To pair up the square brackets with the accompanying type of quote, you can use:
\[(["'])(.*?)\1]
Explanation
\[
Match[
(["'])
Capture group 1, capture either"
or'
(.*?)
Capture group 2, match as least as possible characters- \1` Backreference to group 1 to match the same type of quote
]
Match]
df <- c("['Mamie Smith']", "[\"Screamin' Jay Hawkins\"]")
gsub("\\[([\"'])(.*?)\\1]", "\\2", df)
Output
[1] "Mamie Smith" "Screamin' Jay Hawkins"