Home > other >  dplyr summarize with a dynamic number of stats/conditions
dplyr summarize with a dynamic number of stats/conditions

Time:12-20

I want to summarize my data in different ways, specifically, I want to count how many values are greater or equal than a certain threshold.

I could easily do that with e.g.

library(tidyverse)
mtcars |>
  summarize(test1 = sum(mpg > 15, na.rm = TRUE))

However, how could I use summarize with using several, dynamic such thresholds?

E.g. with an input like my_thresholds <- c(15, 20), I'd like to get the following ouptut:

  test1 test2
1    26    14

I think one way could be using the thresholds as an argument in purrr::map and then later on I just bind_cols the tow summaries. However, the summarize itself is already wrapped in another purrr::map, i.e. my input is actually a list of data frames and I want to get the summaries for each list element:

input data:

input_data <- mtcars |>
  group_split(cyl)

And then my desired output would be one row per group.

One more note, the number of thresholds should also be dynamic, e.g. in one case I might have two thresholds, in another call I might have 5.

CodePudding user response:

What about something like this?

library(purrr)
input_data |>
  map(\(gp) map_int(my_thresholds, \(x) sum(gp$mpg > x, na.rm = TRUE)))

output

[[1]]
[1] 11 11

[[2]]
[1] 7 3

[[3]]
[1] 8 0
  • Related