Home > Net >  Programmatically count grouped data using logic rules and string
Programmatically count grouped data using logic rules and string

Time:06-07

I have a grouped data frame which I want to summarise into "count of values less than x, y, z by group". I can manually generate the wide dataframe I want using code similar to this below

library(tidyverse)
set.seed(1337)

df <- data.frame(cbind(group = seq(1:5), num = sample(x = 1:400, size = 100, replace = T)))

manual <- df %>% 
  group_by(group) %>% 
  summarise(less_than_50 = sum(num < 50),
            less_than_100 = sum(num < 100),
            less_than_150 = sum(num < 150))

However, I'd like to be able to define a list of "less thans" and generate these columns by referring to a list. I've done something similar in the past, though using enframe(quantile()) to generate a long list of quantiles before pivoting

pc <- c(0.1, 0.5, 0.9)

quantiles <- df %>% 
  group_by(group) %>% 
  summarise(enframe(quantile(num, pc))) %>% 
  pivot_wider(
    id_cols = group,
    names_from = name,
    values_from = value
  )

But I don't know / understand the way to define a custom function within the enframe(). Ideally I'd like to apply this in something like the code below (though this obviously doesn't work), with or without the pivot step, in order to get back to the same output as "manual"

levels <- c(50, 100, 150)

programmatic <- df %>% 
  group_by(group) %>% 
  summarise(cols = ("less_than", x), num < levels) %>% 
  pivot...

Any help greatly appreciated

CodePudding user response:

One way you could do it:

library(tidyverse)

set.seed(1337)

df <- data.frame(cbind(group = seq(1:5), num = sample(x = 1:400, size = 100, replace = T)))

less_than <- function(x) {
  
  df %>%
    group_by(group) %>%
    summarise(less_than_ = sum(num < x)) %>%
    rename_with(~ str_c(., x), .cols = -group)
}

levels <- c(50, 100, 150)

map_dfr(levels, less_than) |> 
  group_by(group) |> 
  summarise(across(everything(), mean, na.rm = TRUE))
#> # A tibble: 5 × 4
#>   group less_than_50 less_than_100 less_than_150
#>   <int>        <dbl>         <dbl>         <dbl>
#> 1     1            4             5            10
#> 2     2            2             2             5
#> 3     3            2             6            11
#> 4     4            4             5             5
#> 5     5            1             7             9

# Manual result for comparison
df %>% 
  group_by(group) %>% 
  summarise(less_than_50 = sum(num < 50),
            less_than_100 = sum(num < 100),
            less_than_150 = sum(num < 150))
#> # A tibble: 5 × 4
#>   group less_than_50 less_than_100 less_than_150
#>   <int>        <int>         <int>         <int>
#> 1     1            4             5            10
#> 2     2            2             2             5
#> 3     3            2             6            11
#> 4     4            4             5             5
#> 5     5            1             7             9

Created on 2022-06-06 by the reprex package (v2.0.1)

  • Related