Home > Blockchain >  Split data frame into groups and `count` several variables for each group
Split data frame into groups and `count` several variables for each group

Time:10-28

I'm stuck with this presumably easy task.

I have a data set that I want to group by on a certain variable and then for each of these groups, I want to get the count for several vars. I think that's a job for map, but I can't get my head around how that should look like. My idea was sth. like:

library(tidyverse)

count_vars <- c("vs", "am", "gear")

mtcars |>
  group_split(cyl) |>
  map2(.x = _,
       .y = !!!count_vars,
       .f = ~.x |>
         count(.y))

But this apparently doesn't work.

The expected outcome would be a list with one element per group and within each of these elements I would have another set of lists (one for each count var). I'd also be fine in getting a data frame within each group where the counts for each var are just row binded.

Any ideas?

Note: I do not want to have nested counts like in a simple count(as, vs, gear) command, instead I want to have three different "tables", one for each variable.

An example (doing it only for two vars could look like:

[[1]][1]
# A tibble: 2 × 2
     vs     n
  <dbl> <int>
1     0     1
2     1    10

[[1]][2]
# A tibble: 2 × 2
     am     n
  <dbl> <int>
1     0     3
2     1     8

[[2]][1]
# A tibble: 2 × 2
     vs     n
  <dbl> <int>
1     0     3
2     1     4

[[2]][2]
# A tibble: 2 × 2
     am     n
  <dbl> <int>
1     0     4
2     1     3

[[3]][1]
# A tibble: 1 × 2
     vs     n
  <dbl> <int>
1     0    14

[[3]][2]
# A tibble: 2 × 2
     am     n
  <dbl> <int>
1     0    12
2     1     2

CodePudding user response:

After the group_split, loop over the list, select only the columns from the 'count_vars', then loop over the columns, create a tibble from the columns (as count expects a data.frame/tibble) and get the count - results in a nested list of tibbles i.e. for each 'cyl' split list, there would be 3 tibbles

library(dplyr)
library(purrr)
mtcars |> 
  group_split(cyl) |> 
  map(.x = _, ~ .x |> 
       select(all_of(count_vars)) |> 
       imap(~ tibble(!!.y := .x) |> 
     count(across(everything()))))

-output

[1]]
[[1]]$vs
# A tibble: 2 × 2
     vs     n
  <dbl> <int>
1     0     1
2     1    10

[[1]]$am
# A tibble: 2 × 2
     am     n
  <dbl> <int>
1     0     3
2     1     8

[[1]]$gear
# A tibble: 3 × 2
   gear     n
  <dbl> <int>
1     3     1
2     4     8
3     5     2


[[2]]
[[2]]$vs
# A tibble: 2 × 2
     vs     n
  <dbl> <int>
1     0     3
2     1     4

[[2]]$am
# A tibble: 2 × 2
     am     n
  <dbl> <int>
1     0     4
2     1     3

[[2]]$gear
# A tibble: 3 × 2
   gear     n
  <dbl> <int>
1     3     2
2     4     4
3     5     1


[[3]]
[[3]]$vs
# A tibble: 1 × 2
     vs     n
  <dbl> <int>
1     0    14

[[3]]$am
# A tibble: 2 × 2
     am     n
  <dbl> <int>
1     0    12
2     1     2

[[3]]$gear
# A tibble: 2 × 2
   gear     n
  <dbl> <int>
1     3    12
2     5     2

Or another option is to use nest and store it in a single data as list column

out <-  mtcars %>% 
  select(cyl, all_of(count_vars)) %>%
  nest(data = all_of(count_vars)) %>%
  mutate(data =  map(data, ~ .x %>% 
       split.default(names(.x)) %>%
        map(~ .x %>%
       count(across(everything())))))

-output

> out
# A tibble: 3 × 2
    cyl data            
  <dbl> <list>          
1     6 <named list [3]>
2     4 <named list [3]>
3     8 <named list [3]>
> out$data[[1]]
$am
# A tibble: 2 × 2
     am     n
  <dbl> <int>
1     0     4
2     1     3

$gear
# A tibble: 3 × 2
   gear     n
  <dbl> <int>
1     3     2
2     4     4
3     5     1

$vs
# A tibble: 2 × 2
     vs     n
  <dbl> <int>
1     0     3
2     1     4

CodePudding user response:

Does this dplyr approach produce the output you're looking for?

mtcars %>%
  group_by(cyl) %>%
  count(vs, am, gear)

If not, can you please show an example of what you hope to see?

Edit:

mtcars %>%
  group_by(cyl) %>%
  add_tally(vs, name = 'vs') %>%
  add_tally(am, name = 'am') %>%
  add_tally(gear, name = 'gear') %>%
  select(cyl, vs, am, gear) %>%
  distinct()
  • Related