I need to extract from a string such as outside.HLA.DR.highpass
the part after the first dot, yielding HLA.DR.highpass
.
Importantly, the middle part of the string, outside.xxx.highpass might or might not have additional dots, e.g. outside.CD19.highpass
should yield in CD19.highpass
as well.
I got similar steps where extraction of the first part I do with sub(".[^.] $", "", "outside.HLA.DR.highpass" )
to return "outside.HLA.DR"
. However, I fail to adapt it so that it returns only the part of the string after the first dot?
any help is greatly appreciated!!
CodePudding user response:
You want to use a non-greedy regex operator here to capture the start of the sentence (^
), followed by the fewest possible characters (.*?
) followed by a literal dot (\\.
)
sub("^.*?\\.", "", "outside.HLA.DR.highpass")
# "HLA.DR.highpass"
CodePudding user response:
Your solution for extraction of the first area is correct. Simply apply a similar rule:
sub("^[^.] .","","outside.HLA.DR.highpass")
Should return the desired string.
CodePudding user response:
Here's a solution with stringr
that will work on a vector.
library(stringr)
s <- c("outside.HLA.DR.highpass", "outside.CD19.highpass")
str_sub(
s,
start = str_locate(s, fixed("."))[, 1] 1,
end = str_length(s)
)
#> [1] "HLA.DR.highpass" "CD19.highpass"
Created on 2022-07-13 by the reprex package (v2.0.1)