Is there a quick way to transform intervals (Start and End) into a list of number in this interval i-CodePudding

I have a file with interval values such as this for 50M lines:

>data
 start_pos      end_pos
1 1 10
2 3 6
3 5 9
4 6 11

And I would like to have a table of position occurrences so that I can compute the coverage on each position in the interval file such as this:

>occurence
position    coverage
1    1
2    1
3    2
4    2
5    3
6    4
7    3
8    3
9    3
10    2
11    1

Is there any fast and best way to complete this task in R?

My plan was to loop through the data and concatenate the sequence in each interval into a vector and convert the final vector into a table.

count<-c()
for (row in 1:nrow(data)){

        count<-c(count,(data[row,]$start_pos:data[row,]$end_pos))

}

occurence <- table(count)

The problem is that my file is huge and it takes way to much time and memory to do so.

CodePudding user response：

The Bioconductor IRanges package does this fast and efficiently

library(IRanges)
ir = IRanges(start = c(1, 3, 5, 6), end = c(10, 6, 9, 11))
coverage(ir)

with

> coverage(ir) |> as.data.frame()
   value
1      1
2      1
3      2
4      2
5      3
6      4
7      3
8      3
9      3
10     2
11     1