Home > Mobile >  Create parameterized summaries of a column
Create parameterized summaries of a column

Time:06-30

I have a tibble and I want create several summaries of the same column, specifically the first, second and third quartiles.

To do it, I create a named list of functions and that works fine.

library("tidyverse")

set.seed(1234)

df <- tibble(x = rnorm(100))
df %>%
  summarise(
    across(x,
      list(
        Q1 = ~ quantile(., 1 / 4),
        Q2 = ~ quantile(., 2 / 4),
        Q3 = ~ quantile(., 3 / 4)
      ),
      .names = "{.fn}"
    )
  )
#> # A tibble: 1 × 3
#>       Q1     Q2    Q3
#>    <dbl>  <dbl> <dbl>
#> 1 -0.895 -0.385 0.471

Can I achieve this by specifying the list of probabilities to pass to quantile? So that I save myself typing and more importantly avoid hard-coding the arguments to pass to the aggregating function.

The following doesn't work because it creates one row per probability rather than one column.

df %>%
  summarise(
    across(x, quantile, 1:3 / 4)
  )
#> # A tibble: 3 × 1
#>        x
#>    <dbl>
#> 1 -0.895
#> 2 -0.385
#> 3  0.471

CodePudding user response:

you're almost here

df <- tibble(x = rnorm(100))
df %>%
    summarise(
        across(x,
               map(1:3, ~partial(quantile, probs=./4)),
               .names = "Q{.fn}"
        )
    )

# A tibble: 1 x 3
      Q1     Q2    Q3
   <dbl>  <dbl> <dbl>
1 -0.579 0.0815 0.475

CodePudding user response:

If you define the quantiles like this:

Q <- c(0.25, 0.5, 0.75)

Then the following code will produce columns of the appropriate quantiles with sensible labels:

df %>%
  summarise(
    across(x,
           setNames( lapply(Q, 
            function(x) { f <- ~quantile(., b); f[2][[1]][[3]] <- x; f }),
            paste("Q", round(100 * Q), sep = "_")),
      .names = "{.fn}"
    )
  )
#> # A tibble: 1 x 3
#>     Q_25   Q_50  Q_75
#>    <dbl>  <dbl> <dbl>
#> 1 -0.895 -0.385 0.471

Created on 2022-06-29 by the reprex package (v2.0.1)

  • Related