i want to split this genomic coordinate : chr1:713625-714625
to have only the start coordinate : 713625
I tried this command :
data.table(unlist(lapply(data$gene,function(x)unlist(strsplit(x, [:]))[2])))$V1
but it gives me this : 713625-714625
Do you have any suggestion. thank you in advance
CodePudding user response:
You are almost there when using strsplit
, but should use [:-]
or :|-
> unlist(strsplit("chr1:713625-714625", "[:-]"))[2]
[1] "713625"
> unlist(strsplit("chr1:713625-714625", ":|-"))[2]
[1] "713625"
CodePudding user response:
The following code extracts everything between the :
and -
in the string:
string <- c("chr1:713625-714625")
gsub(".*[:]([^.] )[-].*", "\\1", string)
Output:
[1] "713625"
CodePudding user response:
I tried these 2 commands and both of them gives me the same result :
gsub(".*[:]([^.] )[-].*", "\\1", string) by Quinten
data.table(unlist(lapply(data$gene,function(x)unlist(strsplit(x, "[:-]"))[2])))$V1