Here, I made a data
as simple as possible. id
means the personal id number.
In yearmonthweek
column , 2012052
means 2nd week of May,2012.
Wt
means weight and Ht
means height.
data
looks like this:
data<-data.frame(id=c(1,1,1,1,1,2,2,2,2,2),
yearmonthweek=c(2012052,2012052,2012052,2012053,2012053,2012051,2012052,2012052,2012052,2012052),
Wt=c(61,60,NA,NA,NA,63,62,62,NA,63),
Ht=c(173,174,174,175,NA,173,174,173,173,174))
I want to make this data weekly basis for each id
. That is, within each id
, the values of yearmonthweek
cannot be duplicated. Also, I want to make extra columns to represent average,max, and mean . My expected output looks like this:
data<-data.frame(id=c(1,1,2,2),
yearmonthweek=c(2012052,2012053,2012051,2012052),
Wt_avg=c(60.5,NA,63,62.333),
Wt_max=c(61,NA,63,63),
Wt_min=c(60,NA,63,62),
Ht_avg=c(173.5,174.5,173,173.667),
Ht_max=c(174,175,173,174),
Ht_min=c(173,174,173,173))
CodePudding user response:
Use across
to get apply the same function across multiple columns, and use a list to get multiple functions. Since some of your groups have only NAs, you should do an ifelse
to get NA
instead of NaN
(and warning messages) in those cases.
library(dplyr)
data %>%
group_by(id, yearmonthweek) %>%
summarise(across(c(Wt, Ht), list(avg = ~ ifelse(all(is.na(.x)), NA, mean(.x, na.rm = TRUE)),
min = ~ ifelse(all(is.na(.x)), NA, min(.x, na.rm = TRUE)),
max = ~ ifelse(all(is.na(.x)), NA, max(.x, na.rm = TRUE)))))
id yearmonthweek Wt_avg Wt_min Wt_max Ht_avg Ht_min Ht_max
1 1 2012052 60.5 60 61 174. 173 174
2 1 2012053 NA NA NA 175 175 175
3 2 2012051 63 63 63 173 173 173
4 2 2012052 62.3 62 63 174. 173 174
You can also use this soft
function which removes this issue:
soft <- function(x, f, ...) ifelse(all(is.na(x)), NA, f(x, na.rm = TRUE, ...))
data %>%
group_by(id, yearmonthweek) %>%
summarise(across(c(Wt, Ht), list(avg = ~ soft(.x, mean),
min = ~ soft(.x, min),
max = ~ soft(.x, max))))