Create summary table with averages of column subsets-CodePudding

I am very new to this so be please nice!

My dataset has 3 columns - Col A (chr, 'X' and 'Y'), Col B (chr, 'red', 'blue' 'yellow'), Col C (numeric).

Col_A <- c('X', 'X', 'X', 'Y', 'Y', 'X', 'Y', 'X')
Col_B <- c('red', 'yellow', 'blue', 'blue', 'yellow', 'red', 'blue', 'red')
Col_C <- c(12, 22, 13, 14, 25, 17, 11, 10)

dataset <- as.data.frame(cbind(Col_A, Col_B, Col_C))

How can I create a summary table in R that's like

tbl_summary(dataset, by = 'Col_A')

output shown here

but gives the mean values of Col C for every subgroup in Col B? enter image description here

Right now the code gives me counts for Col B and then the mean value of Col C for 'X' and 'Y'.

Does any of that even make sense?!

CodePudding user response：

Is this what you are looking for? Updated with base R and dplyr options.

# simpler way to create a data.frame and preserve variable types.

Col_A <- c('X', 'X', 'X', 'Y', 'Y', 'X', 'Y', 'X')
Col_B <- c('red', 'yellow', 'blue', 'blue', 'yellow', 'red', 'blue', 'red')
Col_C <- c(12, 22, 13, 14, 25, 17, 11, 10)

dataset <- data.frame(Col_A, Col_B, Col_C)

#Using base R `aggregate`
aggregate(Col_C ~ Col_A   Col_B, data = dataset, mean)

# with `dplyr`
library(dplyr, warn.conflicts = FALSE)

dataset <- 
dataset %>% 
  group_by(Col_A, Col_B) %>% 
  summarise(mean = mean(Col_C, na.rm = TRUE))

dataset

#> # A tibble: 5 x 3
#> # Groups:   Col_A [2]
#>   Col_A Col_B   mean
#>   <chr> <chr>  <dbl>
#> 1 X     blue    13  
#> 2 X     red     13  
#> 3 X     yellow  22  
#> 4 Y     blue    12.5
#> 5 Y     yellow  25

# and as a `gt` table
library(gt)
gt(dataset)

^{Created on 2021-12-23 by the reprex package (v2.0.1)}

CodePudding user response：

Use the data.table structure and it's by parameter

library(data.table)
dataset <- data.table(Col_A, Col_B, Col_C)
dataset[, mean(Col_C), keyby=Col_B]