Home > Back-end >  Extract all digits values after first underscore
Extract all digits values after first underscore

Time:06-15

I want to extract the numbers after the 1st underscore (_), but I don't know why just only 1 number digit is selected.

My sample data is:

myvec<-c("increa_0_1-1","increa_9_25-112","increa_25-50-76" )
as.numeric(gsub("(.*_){1}(\\d)_. ", "\\2", myvec))
[1]  0  9 NA
Warning message:
NAs introduced by coercion 

I'd like:

[1]  0  9 25

Please, any help with it?

CodePudding user response:

library(stringr)
str_extract(myvec, "(?<=_)[0-9] ")
[1] "0"  "9"  "25"

CodePudding user response:

You can use sub (because you will need a single search and replace operation) with a pattern like ^[^_]*_(\d ).*:

myvec<-c("increa_0_1-1","increa_9_25-112","increa_25-50-76" )
sub("^[^_]*_(\\d ).*", "\\1", myvec)
# => [1] "0"  "9"  "25"

See the R demo and the regex demo.

Regex details:

  • ^ - start of string
  • [^_]* - a negated character class that matches any zero or more chars other than _
  • _ - a _ char
  • (\d ) - Group 1 (\1 refers to the value captured into this group from the replacement pattern): one or more digits
  • .* - the rest of the string (. in TRE regex matches line break chars by default).

CodePudding user response:

myvec<-c("increa_0_1-1","increa_9_25-112","increa_25-50-76" )
as.numeric(gsub("[^_]*_(\\d ).*", "\\1", myvec))
[1]  0  9 25

CodePudding user response:

Another possible solution, based on stringr::str_extract:

library(stringr)

myvec<-c("increa_0_1-1","increa_9_25-112","increa_25-50-76" )

as.numeric(str_extract(myvec, "(?<=_)\\d "))

#> [1]  0  9 25
  • Related