Home > Enterprise >  R: Continuously updating a mean with each data point, with conditional resetting
R: Continuously updating a mean with each data point, with conditional resetting

Time:04-20

Example: the subject tests the temperature of the water 8 times, and after each test he reports whether he thinks the temperature has changed. I want to calculate and update the current mean temperature with each test, but then reset the mean every time the subject reports a change (change = 1). Such as:

test  temperature change mean
1          10      0     10.0
2           9      0      9.5
3          11      0     10.0
4          15      1     15.0
5          16      0     15.5
6          17      0     16.0
7           8      1      8.0
8           7      0      7.5

Here, the mean temperature at the second test is the average of tests 1 and 2, the mean at the third test is the average of 1, 2 and 3, but then the mean resets at test 4 (and so on). In my much more complex data set, I would have the row numbers of each change (e.g., c(4,7)).

Any suggestions of how to do this? Thanks!

df = data.frame(test=c(1,2,3,4,5,6,7,8), temperature=c(10,9,11,15,16,17,8,7), change=c(0,0,0,1,0,0,1,0))

CodePudding user response:

Using a cumulative mean, which you can get from cumsum divided by a growing sequence starting at 1 (i.e. counting the elements). The grouping may be done using ave, where we group on the cumsum of the change.

transform(dat, mean=ave(temperature, cumsum(change), 
                        FUN=\(x) cumsum(x)/seq.int(x)))
#   test temperature change mean
# 1    1          10      0 10.0
# 2    2           9      0  9.5
# 3    3          11      0 10.0
# 4    4          15      1 15.0
# 5    5          16      0 15.5
# 6    6          17      0 16.0
# 7    7           8      1  8.0
# 8    8           7      0  7.5

CodePudding user response:

library(dplyr)   
df %>%
  group_by(grp = cumsum(change)) %>%
  mutate(mean = cummean(temperature))

# A tibble: 8 x 5
# Groups:   grp [3]
   test temperature change   grp mean
  <dbl>       <dbl>  <dbl> <dbl> <dbl>
1     1          10      0     0  10  
2     2           9      0     0   9.5
3     3          11      0     0  10  
4     4          15      1     1  15  
5     5          16      0     1  15.5
6     6          17      0     1  16  
7     7           8      1     2   8  
8     8           7      0     2   7.5


library(data.table)
setDT(df)[ , mean := cummean(temperature),by=cumsum(change)][]
   test temperature change mean
1:    1          10      0 10.0
2:    2           9      0  9.5
3:    3          11      0 10.0
4:    4          15      1 15.0
5:    5          16      0 15.5
6:    6          17      0 16.0
7:    7           8      1  8.0
8:    8           7      0  7.5
  •  Tags:  
  • r
  • Related