I am trying to add a -
between letter S
and any number in a column of a data frame. So, this is an example:
VariableA
TRS34
MMH22
GFSR104
GS23
RRTM55
P3
S4
My desired output is:
VariableA
TRS-34
MMH22
GFSR104
GS-23
RRTM55
P3
S-4
I was trying yo use gsub
:
gsub('^([a-z])-([0-9] )$','\\1d\\2',myDF$VariableA)
but this is not working.
How can I solve this? Thanks!
CodePudding user response:
Your ^([a-z])-([0-9] )$
regex attempts to match strings that start with a letter, then have a -
and then one or more digits. This can't work as there are no hyphens in the strings, you want to introduce it into the strings.
You can use
gsub('(S)([0-9])', '\\1-\\2', myDF$VariableA)
The (S)([0-9])
regex matches and captures S
into Group 1 (\1
) and then any digit is captured into Group 2 (\2
) and the replacement pattern is a concatenation of group values with a hyphen in between.
If there is only one substitution expected, replace gsub
with sub
.
See the regex demo and the online R demo.
Other variations:
gsub('(S)(\\d)', '\\1-\\2', myDF$VariableA) # \d also matches digits
gsub('(?<=S)(?=\\d)', '-', myDF$VariableA, perl=TRUE) # Lookarounds make backreferences redundant
CodePudding user response:
You could also use lookbehinds if you set perl=TRUE
:
> gsub('(?<=S)([0-9] )', '-\\1', myDF$VariableA, perl=TRUE)
[1] "TRS-34" "MMH22" "GFSR104" "GS-23" "RRTM55" "P3" "S-4"
>
CodePudding user response:
Here is the version I like using sub
:
myDF$VariableA <- gsub('S(\\d)', 'S-\\1', myDF$VariableA)
This requires using only one capture group.
CodePudding user response:
Using stringr
package
library(stringr)
str_replace_all(myDF$VariableA, 'S(\\d)', 'S-\\1')