Home > Blockchain >  Conditional aggregation into a mean value
Conditional aggregation into a mean value

Time:11-17

I have the following situation:

  ID           VAR       HSP     VAR2      CODE
00001           56        B       345       1       
00001           52        B       306       1       
00002           5         C       32        2       
00002           22        A       31        2       
00003           55        D       23        1       
00003           5         E       87        1       

What I need to do is to: for replicated IDs perform the mean of VAR and VAR2 (i.e. all the numeric variables. I have many other numeric columns). The mean should not be done for the column code. Moreover since I have different types of HSP for the final data frame I need to keep the first as it appears in the sorting. The same for the column code.

Desired output:

  ID           VAR       HSP     VAR2      CODE
00001           54        B       325.5     1       
00002           13.5      C       31.5      2       
00003           30        D       55        1       

CodePudding user response:

For this specific case, you can summarise the IDs like this.

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(VAR = mean(VAR), HSP = first(HSP), VAR2 = mean(VAR2), CODE = first(CODE))

For a more generalized solution, here's how you get the mean for all numerical columns and the first value for all non-numerical values.

library(dplyr)

df %>% 
  group_by(ID) %>% 
  summarise(across(where(~ is.numeric(.x)), mean),
            across(where(~ !is.numeric(.x)), first))
  •  Tags:  
  • r
  • Related