Convert split interval coordinates to continuous interval coordinates-CodePudding

I have a data.frame where each id maps to several discontiguous linear intervals, which do not overlap, and are sorted in ascending order:

df <- data.frame(id = c(rep("id1",3),rep("id2",4)),
                 start = c(101,220,307,550,658,742,855),
                 end = c(154,246,326,625,712,811,944),
                 stringsAsFactors = F)

I'd like to add new start and end columns that will cumulatively sum up the interval widths and show the cumulative start and end coordinates.

So, for the example df above, these new start and end columns (cum.start, cum.end) will be:

df$cum.start <- c(1,55,82,1,77,132,202)
df$cum.end <- c(54,81,101,76,131,201,291)

Any dplyr way of doing this?

CodePudding user response：

We can use lag and cumsum:

library(dplyr)

df1 %>% 
  group_by(id) %>% 
  mutate(cum.start = c(1, lag(cumsum(end - start   1))[-1]   1) ,
         cum.end = cumsum(end - start   1))

#> # A tibble: 7 x 5
#> # Groups:   id [2]
#>   id    start   end cum.start cum.end
#>   <chr> <dbl> <dbl>     <dbl>   <dbl>
#> 1 id1     101   154         1      54
#> 2 id1     220   246        55      81
#> 3 id1     307   326        82     101
#> 4 id2     550   625         1      76
#> 5 id2     658   712        77     131
#> 6 id2     742   811       132     201
#> 7 id2     855   944       202     291

CodePudding user response：

Please find below one possible solution with dplyr

Code

df %>% 
  group_by(id) %>% 
  mutate( diff = end-start 1,
               cum.end = cumsum(diff),
               cum.start = cum.end - diff   1) %>% 
  select(-diff) %>% 
  relocate("cum.end", .after = last_col())

Output

#> # A tibble: 7 x 5
#> # Groups:   id [2]
#>   id    start   end cum.start cum.end
#>   <chr> <dbl> <dbl>     <dbl>   <dbl>
#> 1 id1     101   154         1      54
#> 2 id1     220   246        55      81
#> 3 id1     307   326        82     101
#> 4 id2     550   625         1      76
#> 5 id2     658   712        77     131
#> 6 id2     742   811       132     201
#> 7 id2     855   944       202     291

^{Created on 2021-12-15 by the reprex package (v2.0.1)}