I would like to rearrange the Data I have. It is composed just with names, but some are with brackets and I would like to get rid, to keep the content, and habe at the end 2 names.
For exemple
df <- c ("Do(i)lfal", "Do(i)lferl", "Steff(l)", "Steffe", "Steffi")
I would like to have at the end
df <- c( "Doilfal", "Dolfal", "Doilferl", "Dolferl", "Steff", "Steffl", "Steffe", "Steffi")
I tried
sub("(.*)(\\([a-z]\\))(.*)$", "\\1\\2, \\1\\2\\3", df)
But it is not very working
Thank you very much
CodePudding user response:
df = gsub("[\\(\\)]", "", df)
CodePudding user response:
You made two small mistakes:
In the first case you want \1\2\3, because you want all letter. It's in the second name that you want \1\3 (skipping the middle vowel).
You placed the parentheses themselves (i) inside the capture group. So it's also being capture. You must place the capture group only around the thing inside the parentheses.
A small change to your regex does it:
sub("(.*)\\(([a-z])\\)(.*)$", "\\1\\2\\3, \\1\\3", df)
CodePudding user response:
You can use
df <- c ("Do(i)lfal", "Do(i)lferl", "Steff(l)", "Steffe", "Steffi")
unlist(strsplit( paste(sub("(.*?)\\(([a-z])\\)(.*)", "\\1\\2\\3, \\1\\3", df), collapse=","), "\\s*,\\s*"))
# => [1] "Doilfal"
# [2] "Dolfal"
# [3] "Doilferl"
# [4] "Dolferl"
# [5] "Steffl"
# [6] "Steff"
# [7] "Steffe"
# [8] "Steffi"
See the online R demo and the first regex demo. Details:
- First, the
sub
is executed with the first regex,(.*?)\(([a-z])\)(.*)
that matches(.*?)
- any zero or more chars as few as possible, captured into Group 1 (\1
)\(
- a(
char([a-z])
- Group 2 (\2
): any ASCII lowercase letter\)
- a)
char(.*)
- any zero or more chars as many as possible, captured into Group 3 (\3
)
- Then, the results are
paste
d with a,
char as a collpasing char - Then, the resulting char vector is split with the
\s*,\s*
regex that matches a comma enclosed with zero or more whitespace chars.