Example: the subject tests the temperature of the water 8 times, and after each test he reports whether he thinks the temperature has changed. I want to calculate and update the current mean temperature with each test, but then reset the mean every time the subject reports a change (change = 1
). Such as:
test temperature change mean
1 10 0 10.0
2 9 0 9.5
3 11 0 10.0
4 15 1 15.0
5 16 0 15.5
6 17 0 16.0
7 8 1 8.0
8 7 0 7.5
Here, the mean temperature at the second test is the average of tests 1 and 2, the mean at the third test is the average of 1, 2 and 3, but then the mean resets at test 4 (and so on). In my much more complex data set, I would have the row numbers of each change (e.g., c(4,7)
).
Any suggestions of how to do this? Thanks!
df = data.frame(test=c(1,2,3,4,5,6,7,8), temperature=c(10,9,11,15,16,17,8,7), change=c(0,0,0,1,0,0,1,0))
CodePudding user response:
Using a cumulative mean, which you can get from cumsum
divided by a growing seq
uence starting at 1 (i.e. counting the elements). The grouping may be done using ave
, where we group on the cumsum
of the change.
transform(dat, mean=ave(temperature, cumsum(change),
FUN=\(x) cumsum(x)/seq.int(x)))
# test temperature change mean
# 1 1 10 0 10.0
# 2 2 9 0 9.5
# 3 3 11 0 10.0
# 4 4 15 1 15.0
# 5 5 16 0 15.5
# 6 6 17 0 16.0
# 7 7 8 1 8.0
# 8 8 7 0 7.5
CodePudding user response:
library(dplyr)
df %>%
group_by(grp = cumsum(change)) %>%
mutate(mean = cummean(temperature))
# A tibble: 8 x 5
# Groups: grp [3]
test temperature change grp mean
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 10 0 0 10
2 2 9 0 0 9.5
3 3 11 0 0 10
4 4 15 1 1 15
5 5 16 0 1 15.5
6 6 17 0 1 16
7 7 8 1 2 8
8 8 7 0 2 7.5
library(data.table)
setDT(df)[ , mean := cummean(temperature),by=cumsum(change)][]
test temperature change mean
1: 1 10 0 10.0
2: 2 9 0 9.5
3: 3 11 0 10.0
4: 4 15 1 15.0
5: 5 16 0 15.5
6: 6 17 0 16.0
7: 7 8 1 8.0
8: 8 7 0 7.5