I have a large dataset with two sorts of labels. The first is of the form 'numeric_alphanumeric_alpha' and another which is 'alphanumeric_alpha'. I need to strip the numeric prefix from the first label so that it matches the second label. I know how to remove numbers from alphanumeric data (as below) but this would remove numbers that I need.
gsub('[0-9] ', '', x)
Below is an example of the two different labels I am encountered with well as the prefer
c('12345_F24R2_ABC', 'r87R2_DEFG')
Below is the desired output
c('F24R2_ABC', 'r87R2_DEFG')
CodePudding user response:
A simple regex can do it. ^
refers to the start of a string, \\d
refers to any digits,
indicates one or more time it appears.
gsub("^\\d _", "", c('12345_F24R2_ABC', 'r87R2_DEFG'), perl = T)
[1] "F24R2_ABC" "r87R2_DEFG"
CodePudding user response:
Your code a litte modified:
^[0-9]*
.....starts with number followed by numbers
\\_
.... matches underscore
gsub('^[0-9]*\\_', '', x)
[1] "F24R2_ABC" "r87R2_DEFG"