Apply function to subsets of dataset-CodePudding

I want to apply a function (in this case calculating a mean) on subsets of a dataset. The dataset includes (a variable amount of) measurements, taken on different intervals. I want to perform multiple steps, but I am a bit tangled up in the code.

I want to calculate a mean of DF$value per time on a period of 1 per ID. Cannot figure out how to loop the apply() through the increments of time
time needs to be renamed as year; 0-1 = year 1, 1-2 = year 2
If no 'value' is found in a year, the previously calculated mean of the previous year needs to be carried forward

The function probably needs multiple arguments into an apply() on a subset, but I cannot figure out how. Also the time subset needs to cycle through a period of 1 time per ID

DF
ID    time     value
1     0.1      3
1     0.5      5
1     2.1      6
1     3.3      12
2     0.3      1
2     0.4      3
2     0.6      5
2     1.2      4
2     1.5      8
2     2.6      2
2     2.7      14
3     0.1      1.1
3     0.4      2.3
3     1.3      6
3     1.5      3
3     1.6      6

So it will turn into

DF2
ID    year     mean_value
1     1        4
1     2        4   (=carried forward)
1     3        6
1     4        12
2     1        3
2     2        6
2     3        8
3     2        5

CodePudding user response：

Here is one option -

Use ceiling to round up the time values.
for each ID and year calculate the average value.
Use complete to create the missing year value.
fill to carry forward the average value.

library(dplyr)
library(tidyr)

df %>%
  group_by(ID, year = ceiling(time)) %>%
  summarise(mean_value = mean(value)) %>%
  complete(year = min(year):max(year)) %>%
  fill(mean_value) %>%
  ungroup

#     ID  year mean_value
#  <int> <dbl>      <dbl>
#1     1     1        4  
#2     1     2        4  
#3     1     3        6  
#4     1     4       12  
#5     2     1        3  
#6     2     2        6  
#7     2     3        8  
#8     3     1        1.7
#9     3     2        5

data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L), time = c(0.1, 0.5, 2.1, 3.3, 0.3, 0.4, 
0.6, 1.2, 1.5, 2.6, 2.7, 0.1, 0.4, 1.3, 1.5, 1.6), value = c(3, 
5, 6, 12, 1, 3, 5, 4, 8, 2, 14, 1.1, 2.3, 6, 3, 6)), 
class = "data.frame", row.names = c(NA, -16L))