Home > front end >  substitute string when there is a dot number ':'
substitute string when there is a dot number ':'

Time:10-23

I have strings that look like these:

> ABCD.1:f_HJK
> ABFD.1:f_HTK
> CJD:f_HRK
> QQYP.2:f_HDP

So basically, I have always a string in the first part, I could have a part with . and a number, and after this part I always have ':' and a string. I would like to remove the '. number' when it is included in the string, using R.

I know that maybe regular expressions could be useful but I have not idea about I can apply them in this context. I know that I can substitute the '.' with gsub, but not idea about how I can add the information about number and ':'.

Thank you for your help.

CodePudding user response:

Does this work:

v <- c('ABCD.1:f_HJK','ABFD.1:f_HTK','CJD:f_HRK','QQYP.2:f_HDP')
v
[1] "ABCD.1:f_HJK" "ABFD.1:f_HTK" "CJD:f_HRK"    "QQYP.2:f_HDP"
gsub('([A-Z]{,4})(\\.\\d)?(:.*)','\\1\\3',v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK"  "QQYP:f_HDP"

CodePudding user response:

You could also use any of the following depending on the structure of your string

  • If no other period and numbers in the string

     sub("\\.\\d ", "", v)
     [1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK"  "QQYP:f_HDP"
    
  • If you are only interested in the first pattern matched.

     sub("^([A-Z] )\\.\\d :", "\\1:", v)
     [1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK"  "QQYP:f_HDP"
    
  • Same as above, invoking perl. ie no captured groups

      sub("^[A-Z] \\K\\.\\d ", "", v, perl = TRUE)
      [1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK"  "QQYP:f_HDP"
    

CodePudding user response:

If I understood your explanation correctly, this should do the trick:

gsub("(\\.\\d )", "", string)
  • Related