Home > Mobile >  Grouping rows by multiple keys and get the average of numerical and mode of categorical
Grouping rows by multiple keys and get the average of numerical and mode of categorical

Time:10-21

I'm having a dataframe with both numerical and categorical variables.

The data frame has 2 keys where I need group by. When the rows are grouped by keys so & li, then categorical variables need to be selected based on mode & numerical by average.

   SO      LI        A         B
1 2000     20        P          22
2 2000     20        P          40
3 1000     10        Q          80

The output needs to be,

   SO      LI        A         B
1 2000     20        P          31
2 1000     10        Q          80

I used the following code so far.

library(plyr)
groupColumns = c("so","li")
dataColumns = c(colnames(numericalColumns(Planning_DF)))
res = ddply(Planning_DF, groupColumns, function(x) colMeans(x[dataColumns]))
head(res)

so the numerical column grouping & average happening. How to get the categorical variables Mode to this?

CodePudding user response:

It is easier with dplyr

library(dplyr)
groupColumns = c("SO","LI")
Planning_DF %>%
    group_by(across(all_of(groupColumns))) %>%
    summarise(across(where(is.numeric), mean), 
           across(where(is.character), Mode), .groups = 'drop')

-output

# A tibble: 2 × 4
     SO    LI     B A    
  <int> <int> <dbl> <chr>
1  1000    10    80 Q    
2  2000    20    31 P    

where

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

If we need to do this in plyr, use colwise with a custom function that does the mean or Mode based on the column type

f1 <- function(x) if(is.numeric(x)) mean(x, na.rm = TRUE) else Mode(x)
plyr::ddply(Planning_DF, .variables = groupColumns, 
     .fun = plyr::colwise(f1))

-output

    SO LI A  B
1 1000 10 Q 80
2 2000 20 P 31

data

Planning_DF <- structure(list(SO = c(2000L, 2000L, 1000L), LI = c(20L, 20L, 
10L), A = c("P", "P", "Q"), B = c(22L, 40L, 80L)), 
class = "data.frame", row.names = c("1", 
"2", "3"))
  •  Tags:  
  • r
  • Related