Home > other >  Add symbol between the letter S and any number in a column dataframe
Add symbol between the letter S and any number in a column dataframe

Time:10-21

I am trying to add a - between letter S and any number in a column of a data frame. So, this is an example:

VariableA

TRS34
MMH22
GFSR104
GS23
RRTM55
P3
S4

My desired output is:

VariableA
TRS-34
MMH22
GFSR104
GS-23
RRTM55
P3
S-4

I was trying yo use gsub:

gsub('^([a-z])-([0-9] )$','\\1d\\2',myDF$VariableA)

but this is not working.

How can I solve this? Thanks!

CodePudding user response:

Your ^([a-z])-([0-9] )$ regex attempts to match strings that start with a letter, then have a - and then one or more digits. This can't work as there are no hyphens in the strings, you want to introduce it into the strings.

You can use

gsub('(S)([0-9])', '\\1-\\2', myDF$VariableA)

The (S)([0-9]) regex matches and captures S into Group 1 (\1) and then any digit is captured into Group 2 (\2) and the replacement pattern is a concatenation of group values with a hyphen in between. If there is only one substitution expected, replace gsub with sub.

See the regex demo and the online R demo.

Other variations:

gsub('(S)(\\d)', '\\1-\\2', myDF$VariableA)             # \d also matches digits
gsub('(?<=S)(?=\\d)', '-', myDF$VariableA, perl=TRUE)   # Lookarounds make backreferences redundant

CodePudding user response:

You could also use lookbehinds if you set perl=TRUE:

> gsub('(?<=S)([0-9] )', '-\\1', myDF$VariableA, perl=TRUE)
[1] "TRS-34"  "MMH22"   "GFSR104" "GS-23"   "RRTM55"  "P3"      "S-4"    
> 

CodePudding user response:

Here is the version I like using sub:

myDF$VariableA <- gsub('S(\\d)', 'S-\\1', myDF$VariableA)

This requires using only one capture group.

CodePudding user response:

Using stringr package

library(stringr)
str_replace_all(myDF$VariableA, 'S(\\d)', 'S-\\1')
  • Related