Make extra variables for max, min and average in R-CodePudding

Here, I made a data as simple as possible. id means the personal id number. In yearmonthweek column , 2012052 means 2nd week of May,2012. Wt means weight and Ht means height.

data looks like this:

data<-data.frame(id=c(1,1,1,1,1,2,2,2,2,2),
                 yearmonthweek=c(2012052,2012052,2012052,2012053,2012053,2012051,2012052,2012052,2012052,2012052),
                 Wt=c(61,60,NA,NA,NA,63,62,62,NA,63),
                 Ht=c(173,174,174,175,NA,173,174,173,173,174))

I want to make this data weekly basis for each id. That is, within each id, the values of yearmonthweek cannot be duplicated. Also, I want to make extra columns to represent average,max, and mean . My expected output looks like this:

data<-data.frame(id=c(1,1,2,2),
                 yearmonthweek=c(2012052,2012053,2012051,2012052),
                 Wt_avg=c(60.5,NA,63,62.333),
                 Wt_max=c(61,NA,63,63),
                 Wt_min=c(60,NA,63,62),
                 Ht_avg=c(173.5,174.5,173,173.667),
                 Ht_max=c(174,175,173,174),
                 Ht_min=c(173,174,173,173))

CodePudding user response：

Use across to get apply the same function across multiple columns, and use a list to get multiple functions. Since some of your groups have only NAs, you should do an ifelse to get NA instead of NaN (and warning messages) in those cases.

library(dplyr)
data %>% 
  group_by(id, yearmonthweek) %>% 
  summarise(across(c(Wt, Ht), list(avg = ~ ifelse(all(is.na(.x)), NA, mean(.x, na.rm = TRUE)), 
                                   min = ~ ifelse(all(is.na(.x)), NA, min(.x, na.rm = TRUE)), 
                                   max = ~ ifelse(all(is.na(.x)), NA, max(.x, na.rm = TRUE)))))

     id yearmonthweek Wt_avg Wt_min Wt_max Ht_avg Ht_min Ht_max
1     1       2012052   60.5     60     61   174.    173    174
2     1       2012053   NA       NA     NA   175     175    175
3     2       2012051   63       63     63   173     173    173
4     2       2012052   62.3     62     63   174.    173    174

You can also use this soft function which removes this issue:

soft <- function(x, f, ...) ifelse(all(is.na(x)), NA, f(x, na.rm = TRUE, ...))

data %>% 
  group_by(id, yearmonthweek) %>% 
  summarise(across(c(Wt, Ht), list(avg = ~ soft(.x, mean),
                                   min = ~ soft(.x, min),
                                   max = ~ soft(.x, max))))