Home > Software engineering >  Is there a quick way to transform intervals (Start and End) into a list of number in this interval i
Is there a quick way to transform intervals (Start and End) into a list of number in this interval i

Time:11-09

I have a file with interval values such as this for 50M lines:

>data
 start_pos      end_pos
1 1 10
2 3 6
3 5 9
4 6 11

And I would like to have a table of position occurrences so that I can compute the coverage on each position in the interval file such as this:

>occurence
position    coverage
1    1
2    1
3    2
4    2
5    3
6    4
7    3
8    3
9    3
10    2
11    1

Is there any fast and best way to complete this task in R?

My plan was to loop through the data and concatenate the sequence in each interval into a vector and convert the final vector into a table.

count<-c()
for (row in 1:nrow(data)){

        count<-c(count,(data[row,]$start_pos:data[row,]$end_pos))

}

occurence <- table(count)


The problem is that my file is huge and it takes way to much time and memory to do so.

CodePudding user response:

The Bioconductor IRanges package does this fast and efficiently

library(IRanges)
ir = IRanges(start = c(1, 3, 5, 6), end = c(10, 6, 9, 11))
coverage(ir)

with

> coverage(ir) |> as.data.frame()
   value
1      1
2      1
3      2
4      2
5      3
6      4
7      3
8      3
9      3
10     2
11     1
  • Related