i have a list of string that looks like this:
list=["chr21-10139833-A-C","chry-10139832-b-f"]
for every string in the list i need to extract the numbers between "-" and "-"
so i would get:
[10139833,10139832]
i tried this :
gsub(".*[-]([^-] )[-]", "\\1", list
but it returns :
[ac,bf]
what can i do to make it work ? thank you
CodePudding user response:
Using str_extract
from stringr
we can try:
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- str_extract(list, "(?<=-)(\\d )(?=-)")
nums
[1] "10139833" "10139832"
We could also use sub
for a base R option:
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- sub(".*-(\\d ).*", "\\1", list)
nums
[1] "10139833" "10139832"
CodePudding user response:
You can use str_split_i
to get the i
th split string:
library(stringr)
str <- c("chr21-10139833-A-C", "chry-10139832-b-f")
str_split_i(str, "-", i = 2)
#[1] "10139833" "10139832"
CodePudding user response:
1) Using the input shown in the Note at the end, use read.table
. If you want character output instead add colClasses = "character"
argument to read.table
.
read.table(text = x, sep = "-")[[2]]
## [1] 10139833 10139832
2) Another possibility is to use strapply
. If you want character output then omit the as.numeric
argument.
library(gsubfn)
strapply(x, "-(\\d )-", as.numeric, simplify = TRUE)
## [1] 10139833 10139832
Note
x <- c("chr21-10139833-A-C", "chry-10139832-b-f")
CodePudding user response:
If your structure and character of your string are always like that with word characters and hyphens, you could match 1 digits between word boundaries:
library(stringr)
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
str_extract(list, "\\b\\d \\b")
Or with a perl like pattern and \K
you might also use
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
regmatches(list, regexpr("-\\K\\d (?=-)", list, perl = TRUE))
Both will output:
[1] "10139833" "10139832"