I have a dataset with multiple individuals (Cow_ID) and their daily difference in temperature looking like this:
df<-data.frame("Cow_ID"=c("192","192","192","192","201","201","201","201","652","652","652","652"),"Day"=c("1","2","3","4","1","2","3","4","1","2","3","4"),"Temp_Diff"=c("0.2","0.1","na","0.8","0.4","0.1","0.7","na","0.1","0.3","na","0.8"))
I want to calculate a cumulative mean for each individual in a new column. I need it to exclude the na's too.
I managed to get some code working to run this for the whole data set, but I could not find a way to get it to stop at the end of an individual and start again for the next one.
df<-df%>%mutate(new=replace(Temp_Diff,!is.na(Temp_Diff),cummean(Temp_Diff[!is.na(Temp_Diff)])))%>%fill(new)
Please excuse the sloppy coding, I am very new to this!
CodePudding user response:
If it is by 'Cow_ID', then do a group by on that column. Also, make sure to convert the "na"
to NA
before applying the is.na
as well as convert the type as the 'Temp_Diff' column was character
class as a result of "na" character element which doesn't work with cummean
library(dplyr)
df %>%
na_if("na") %>%
type.convert(as.is = TRUE) %>%
group_by(Cow_ID) %>%
mutate(new = replace(Temp_Diff, !is.na(Temp_Diff),
cummean(Temp_Diff[!is.na(Temp_Diff)]))) %>%
ungroup()
-output
# A tibble: 12 × 4
Cow_ID Day Temp_Diff new
<int> <int> <dbl> <dbl>
1 192 1 0.2 0.2
2 192 2 0.1 0.15
3 192 3 NA NA
4 192 4 0.8 0.367
5 201 1 0.4 0.4
6 201 2 0.1 0.25
7 201 3 0.7 0.4
8 201 4 NA NA
9 652 1 0.1 0.1
10 652 2 0.3 0.2
11 652 3 NA NA
12 652 4 0.8 0.4
Or using data.table
, convert the 'data.frame' to 'data.table' (setDT
), conver the 'Temp_Diff' to numeric
column with as.numeric
returns NA
for "na"
while doing the force conversion, then create the 'new' column by specifying a logical condition in i
to subset only the non-NA elements, do the cummean
on 'j', grouped by 'Cow_ID' and assign (:=
) to a new column 'new'
library(data.table)
setDT(df)[, Temp_Diff := as.numeric(Temp_Diff)][!is.na(Temp_Diff),
new := cummean(Temp_Diff), by = Cow_ID]