Home > OS >  Calculate quantiles with grouping for multiple columns with dplyr
Calculate quantiles with grouping for multiple columns with dplyr

Time:07-14

Tidyverse provides an example of how to calculate quantiles with grouping for a single column, x. Below, I've modified that example to include column z which is character and a missing value (e.g., so needing to use na.rm=TRUE).

I want to modify the below modified code so that works it works for all numeric columns (i.e., is.numeric)

library(tidyverse)
df <- tibble(
  grp = rep(c("A","B"), each = 5), 
  x = c(rnorm(5, -0.25, 1), rnorm(5, 0, 1.5)),
  y = c(rnorm(5, 0.25, 1), rnorm(5, 0, 0.5)),
  z = letters[1:10],
)

df[3,2] <-NA
df

# A tibble: 10 × 4
   grp         x       y z    
   <chr>   <dbl>   <dbl> <chr>
 1 A      1.02   -0.653  a    
 2 A     -0.370  -0.185  b    
 3 A     NA      -1.49   c    
 4 A     -0.336   0.408  d    
 5 A     -0.0526  1.90   e    
 6 B      1.45   -0.0769 f    
 7 B      1.36   -1.11   g    
 8 B      0.0972 -0.0394 h    
 9 B     -1.72   -0.198  i    
10 B     -2.50    0.364  j 

quibble <- function(x, q = c(0.25, 0.5, 0.75), dropNA = TRUE) {
  tibble(x = quantile(x, q, na.rm = dropNA), q = q)
}

df %>% 
  group_by(grp) %>% 
  summarise(x = list(quibble(x, c(0.25, 0.75), dropNA = TRUE))) %>% 
  tidyr::unnest(x)

# A tibble: 4 × 3
  grp        x     q
  <chr>  <dbl> <dbl>
1 A     -0.345  0.25
2 A      0.215  0.75
3 B     -1.72   0.25
4 B      1.36   0.75

My attempt has included the below which was inspired from here

df %>% 
  group_by(grp) %>%
  summarise(across(everything(), where(is.numeric) ~ quibble(.x, c(0.25, 0.75), dropNA = TRUE))) %>% 
  tidyr::unnest(.x)

I'm aware of this page which includes sapply options in comments and a solution that I have not been able to apply to this situation (multiple numeric and columns, missing values, and grouping).

CodePudding user response:

Here, we don't need both everything() and is.numeric. In addition, there should be a , between the where expression and the lambda expression

library(dplyr)
library(tidyr)
df %>% 
  group_by(grp) %>%
  summarise(across( where(is.numeric), ~ quibble(.x, c(0.25, 0.75), 
     dropNA = TRUE)), .groups = 'drop') %>% 
  tidyr::unnest(where(is_tibble), names_repair = 'unique', names_sep = "_")

-output

# A tibble: 4 × 5
  grp        x_x   x_q    y_x   y_q
  <chr>    <dbl> <dbl>  <dbl> <dbl>
1 A      0.00519  0.25 -0.888  0.25
2 A      0.723    0.75  1.09   0.75
3 B     -1.54     0.25  0.213  0.25
4 B     -0.938    0.75  0.439  0.75
  • Related