I have a data.frame
where each id
maps to several discontiguous linear intervals, which do not overlap, and are sorted in ascending order:
df <- data.frame(id = c(rep("id1",3),rep("id2",4)),
start = c(101,220,307,550,658,742,855),
end = c(154,246,326,625,712,811,944),
stringsAsFactors = F)
I'd like to add new start
and end
columns that will cumulatively sum up the interval widths and show the cumulative start and end coordinates.
So, for the example df
above, these new start
and end
columns (cum.start
, cum.end
) will be:
df$cum.start <- c(1,55,82,1,77,132,202)
df$cum.end <- c(54,81,101,76,131,201,291)
Any dplyr
way of doing this?
CodePudding user response:
We can use lag
and cumsum
:
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(cum.start = c(1, lag(cumsum(end - start 1))[-1] 1) ,
cum.end = cumsum(end - start 1))
#> # A tibble: 7 x 5
#> # Groups: id [2]
#> id start end cum.start cum.end
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 id1 101 154 1 54
#> 2 id1 220 246 55 81
#> 3 id1 307 326 82 101
#> 4 id2 550 625 1 76
#> 5 id2 658 712 77 131
#> 6 id2 742 811 132 201
#> 7 id2 855 944 202 291
CodePudding user response:
Please find below one possible solution with dplyr
- Code
df %>%
group_by(id) %>%
mutate( diff = end-start 1,
cum.end = cumsum(diff),
cum.start = cum.end - diff 1) %>%
select(-diff) %>%
relocate("cum.end", .after = last_col())
- Output
#> # A tibble: 7 x 5
#> # Groups: id [2]
#> id start end cum.start cum.end
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 id1 101 154 1 54
#> 2 id1 220 246 55 81
#> 3 id1 307 326 82 101
#> 4 id2 550 625 1 76
#> 5 id2 658 712 77 131
#> 6 id2 742 811 132 201
#> 7 id2 855 944 202 291
Created on 2021-12-15 by the reprex package (v2.0.1)