Home > database >  Strsplit multiple delimiters r
Strsplit multiple delimiters r

Time:05-20

i want to split this genomic coordinate : chr1:713625-714625

to have only the start coordinate : 713625

I tried this command :

data.table(unlist(lapply(data$gene,function(x)unlist(strsplit(x, [:]))[2])))$V1

but it gives me this : 713625-714625

Do you have any suggestion. thank you in advance

CodePudding user response:

You are almost there when using strsplit, but should use [:-] or :|-

> unlist(strsplit("chr1:713625-714625", "[:-]"))[2]
[1] "713625"

> unlist(strsplit("chr1:713625-714625", ":|-"))[2]
[1] "713625"

CodePudding user response:

The following code extracts everything between the : and - in the string:

string <- c("chr1:713625-714625")
gsub(".*[:]([^.] )[-].*", "\\1", string)

Output:

[1] "713625"

CodePudding user response:

I tried these 2 commands and both of them gives me the same result :

gsub(".*[:]([^.] )[-].*", "\\1", string) by Quinten

data.table(unlist(lapply(data$gene,function(x)unlist(strsplit(x, "[:-]"))[2])))$V1
  • Related