I want to apply a function (in this case calculating a mean) on subsets of a dataset. The dataset includes (a variable amount of) measurements, taken on different intervals. I want to perform multiple steps, but I am a bit tangled up in the code.
I want to calculate a mean of
DF$value
pertime
on a period of 1 perID
. Cannot figure out how to loop theapply()
through the increments oftime
time
needs to be renamed asyear
; 0-1 = year 1, 1-2 = year 2If no 'value' is found in a year, the previously calculated mean of the previous year needs to be carried forward
The function probably needs multiple arguments into an apply() on a subset, but I cannot figure out how. Also the time subset needs to cycle through a period of 1 time
per ID
DF
ID time value
1 0.1 3
1 0.5 5
1 2.1 6
1 3.3 12
2 0.3 1
2 0.4 3
2 0.6 5
2 1.2 4
2 1.5 8
2 2.6 2
2 2.7 14
3 0.1 1.1
3 0.4 2.3
3 1.3 6
3 1.5 3
3 1.6 6
So it will turn into
DF2
ID year mean_value
1 1 4
1 2 4 (=carried forward)
1 3 6
1 4 12
2 1 3
2 2 6
2 3 8
3 2 5
CodePudding user response:
Here is one option -
- Use
ceiling
to round up the time values. - for each
ID
andyear
calculate the average value. - Use
complete
to create the missing year value. fill
to carry forward the average value.
library(dplyr)
library(tidyr)
df %>%
group_by(ID, year = ceiling(time)) %>%
summarise(mean_value = mean(value)) %>%
complete(year = min(year):max(year)) %>%
fill(mean_value) %>%
ungroup
# ID year mean_value
# <int> <dbl> <dbl>
#1 1 1 4
#2 1 2 4
#3 1 3 6
#4 1 4 12
#5 2 1 3
#6 2 2 6
#7 2 3 8
#8 3 1 1.7
#9 3 2 5
data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L), time = c(0.1, 0.5, 2.1, 3.3, 0.3, 0.4,
0.6, 1.2, 1.5, 2.6, 2.7, 0.1, 0.4, 1.3, 1.5, 1.6), value = c(3,
5, 6, 12, 1, 3, 5, 4, 8, 2, 14, 1.1, 2.3, 6, 3, 6)),
class = "data.frame", row.names = c(NA, -16L))