I have this dataset
id <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B")
event <- c("dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface",
"dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface")
duration <- c(55, 56, 40, 96, 58, 14, 43, 77, 19, 28, 34, 63, 29, 47, 61, 90)
df <- tibble(id, event, duration)
I need that the each "dive" row the duration proportion of surface be calculated using the subsequent "surface", and insert the result into a new column. All this separated by "id".
proportion = surface/dive surface
#Output dataframe
# A tibble: 8 x 4
id event duration proportion
1 A surface 56 x
2 A surface 96 x
3 A surface 14 x
4 A surface 77 x
5 B surface 28 x
6 B surface 63 x
7 B surface 47 x
8 B surface 90 x
CodePudding user response:
We can use gl
to create the grouping index every 2 rows, and then create the column 'proportion' by dividing the 'duration' where event value is 'surface' (event == 'surface'
) with the sum
of 'duration'
library(dplyr)
df %>%
group_by(id) %>%
group_by(grp = as.integer(gl(n(), 2, n())), .add = TRUE) %>%
mutate(proportion = duration[event == 'surface']/sum(duration)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 16 × 4
id event duration proportion
<chr> <chr> <dbl> <dbl>
1 A dive 55 0.505
2 A surface 56 0.505
3 A dive 40 0.706
4 A surface 96 0.706
5 A dive 58 0.194
6 A surface 14 0.194
7 A dive 43 0.642
8 A surface 77 0.642
9 B dive 19 0.596
10 B surface 28 0.596
11 B dive 34 0.649
12 B surface 63 0.649
13 B dive 29 0.618
14 B surface 47 0.618
15 B dive 61 0.596
16 B surface 90 0.596