R: How to extract everything after first occurance of a dot (.)-CodePudding

I need to extract from a string such as outside.HLA.DR.highpass the part after the first dot, yielding HLA.DR.highpass.

Importantly, the middle part of the string, outside.xxx.highpass might or might not have additional dots, e.g. outside.CD19.highpass should yield in CD19.highpass as well.

I got similar steps where extraction of the first part I do with sub(".[^.] $", "", "outside.HLA.DR.highpass" ) to return "outside.HLA.DR". However, I fail to adapt it so that it returns only the part of the string after the first dot? any help is greatly appreciated!!

CodePudding user response：

You want to use a non-greedy regex operator here to capture the start of the sentence (^), followed by the fewest possible characters (.*?) followed by a literal dot (\\.)

sub("^.*?\\.", "", "outside.HLA.DR.highpass")
# "HLA.DR.highpass"

CodePudding user response：

Your solution for extraction of the first area is correct. Simply apply a similar rule:

sub("^[^.] .","","outside.HLA.DR.highpass")

Should return the desired string.

CodePudding user response：

Here's a solution with stringr that will work on a vector.

library(stringr)

s <- c("outside.HLA.DR.highpass", "outside.CD19.highpass")

str_sub(
  s, 
  start = str_locate(s, fixed("."))[, 1]   1, 
  end   = str_length(s)
)
#> [1] "HLA.DR.highpass" "CD19.highpass"

^{Created on 2022-07-13 by the reprex package (v2.0.1)}