Home > Enterprise >  Removing brackets in a string without the content
Removing brackets in a string without the content

Time:11-22

I would like to rearrange the Data I have. It is composed just with names, but some are with brackets and I would like to get rid, to keep the content, and habe at the end 2 names.

For exemple

df <- c ("Do(i)lfal", "Do(i)lferl", "Steff(l)", "Steffe", "Steffi")

I would like to have at the end

df <- c( "Doilfal", "Dolfal", "Doilferl", "Dolferl", "Steff", "Steffl", "Steffe", "Steffi")

I tried

sub("(.*)(\\([a-z]\\))(.*)$", "\\1\\2, \\1\\2\\3", df)

But it is not very working

Thank you very much

CodePudding user response:

df = gsub("[\\(\\)]", "",  df)

CodePudding user response:

You made two small mistakes:

  1. In the first case you want \1\2\3, because you want all letter. It's in the second name that you want \1\3 (skipping the middle vowel).

  2. You placed the parentheses themselves (i) inside the capture group. So it's also being capture. You must place the capture group only around the thing inside the parentheses.

A small change to your regex does it:

sub("(.*)\\(([a-z])\\)(.*)$", "\\1\\2\\3, \\1\\3", df)

CodePudding user response:

You can use

df <- c ("Do(i)lfal", "Do(i)lferl", "Steff(l)", "Steffe", "Steffi")
unlist(strsplit( paste(sub("(.*?)\\(([a-z])\\)(.*)", "\\1\\2\\3, \\1\\3", df), collapse=","), "\\s*,\\s*"))
# => [1] "Doilfal" 
#    [2] "Dolfal"  
#    [3] "Doilferl"
#    [4] "Dolferl" 
#    [5] "Steffl"  
#    [6] "Steff"   
#    [7] "Steffe"  
#    [8] "Steffi"  

See the online R demo and the first regex demo. Details:

  • First, the sub is executed with the first regex, (.*?)\(([a-z])\)(.*) that matches
    • (.*?) - any zero or more chars as few as possible, captured into Group 1 (\1)
    • \( - a ( char
    • ([a-z]) - Group 2 (\2): any ASCII lowercase letter
    • \) - a ) char
    • (.*) - any zero or more chars as many as possible, captured into Group 3 (\3)
  • Then, the results are pasted with a , char as a collpasing char
  • Then, the resulting char vector is split with the \s*,\s* regex that matches a comma enclosed with zero or more whitespace chars.
  • Related