Home > other >  extract substring between "-" and "-" in string in R
extract substring between "-" and "-" in string in R

Time:12-20

i have a list of string that looks like this:

list=["chr21-10139833-A-C","chry-10139832-b-f"]

for every string in the list i need to extract the numbers between "-" and "-"

so i would get:

[10139833,10139832]

i tried this :

gsub(".*[-]([^-] )[-]", "\\1", list

but it returns :

[ac,bf]

what can i do to make it work ? thank you

CodePudding user response:

Using str_extract from stringr we can try:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- str_extract(list, "(?<=-)(\\d )(?=-)")
nums

[1] "10139833" "10139832"

We could also use sub for a base R option:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- sub(".*-(\\d ).*", "\\1", list)
nums

[1] "10139833" "10139832"

CodePudding user response:

You can use str_split_i to get the ith split string:

library(stringr)
str <- c("chr21-10139833-A-C", "chry-10139832-b-f")

str_split_i(str, "-", i = 2)
#[1] "10139833" "10139832"

CodePudding user response:

1) Using the input shown in the Note at the end, use read.table. If you want character output instead add colClasses = "character" argument to read.table .

read.table(text = x, sep = "-")[[2]]
## [1] 10139833 10139832

2) Another possibility is to use strapply. If you want character output then omit the as.numeric argument.

library(gsubfn)
strapply(x, "-(\\d )-", as.numeric, simplify = TRUE)
## [1] 10139833 10139832

Note

x <- c("chr21-10139833-A-C", "chry-10139832-b-f")

CodePudding user response:

If your structure and character of your string are always like that with word characters and hyphens, you could match 1 digits between word boundaries:

library(stringr)
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
str_extract(list, "\\b\\d \\b")

Or with a perl like pattern and \K you might also use

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
regmatches(list, regexpr("-\\K\\d (?=-)", list, perl = TRUE))

Both will output:

[1] "10139833" "10139832"
  • Related