I'm stuck with this presumably easy task.
I have a data set that I want to group by on a certain variable and then for each of these groups, I want to get the count for several vars. I think that's a job for map
, but I can't get my head around how that should look like. My idea was sth. like:
library(tidyverse)
count_vars <- c("vs", "am", "gear")
mtcars |>
group_split(cyl) |>
map2(.x = _,
.y = !!!count_vars,
.f = ~.x |>
count(.y))
But this apparently doesn't work.
The expected outcome would be a list with one element per group and within each of these elements I would have another set of lists (one for each count var). I'd also be fine in getting a data frame within each group where the counts for each var are just row binded.
Any ideas?
Note: I do not want to have nested counts like in a simple count(as, vs, gear)
command, instead I want to have three different "tables", one for each variable.
An example (doing it only for two vars could look like:
[[1]][1]
# A tibble: 2 × 2
vs n
<dbl> <int>
1 0 1
2 1 10
[[1]][2]
# A tibble: 2 × 2
am n
<dbl> <int>
1 0 3
2 1 8
[[2]][1]
# A tibble: 2 × 2
vs n
<dbl> <int>
1 0 3
2 1 4
[[2]][2]
# A tibble: 2 × 2
am n
<dbl> <int>
1 0 4
2 1 3
[[3]][1]
# A tibble: 1 × 2
vs n
<dbl> <int>
1 0 14
[[3]][2]
# A tibble: 2 × 2
am n
<dbl> <int>
1 0 12
2 1 2
CodePudding user response:
After the group_split
, loop over the list
, select
only the columns from the 'count_vars', then loop over the columns, create a tibble
from the columns (as count
expects a data.frame/tibble) and get the count
- results in a nested list of tibbles i.e. for each 'cyl' split list, there would be 3 tibbles
library(dplyr)
library(purrr)
mtcars |>
group_split(cyl) |>
map(.x = _, ~ .x |>
select(all_of(count_vars)) |>
imap(~ tibble(!!.y := .x) |>
count(across(everything()))))
-output
[1]]
[[1]]$vs
# A tibble: 2 × 2
vs n
<dbl> <int>
1 0 1
2 1 10
[[1]]$am
# A tibble: 2 × 2
am n
<dbl> <int>
1 0 3
2 1 8
[[1]]$gear
# A tibble: 3 × 2
gear n
<dbl> <int>
1 3 1
2 4 8
3 5 2
[[2]]
[[2]]$vs
# A tibble: 2 × 2
vs n
<dbl> <int>
1 0 3
2 1 4
[[2]]$am
# A tibble: 2 × 2
am n
<dbl> <int>
1 0 4
2 1 3
[[2]]$gear
# A tibble: 3 × 2
gear n
<dbl> <int>
1 3 2
2 4 4
3 5 1
[[3]]
[[3]]$vs
# A tibble: 1 × 2
vs n
<dbl> <int>
1 0 14
[[3]]$am
# A tibble: 2 × 2
am n
<dbl> <int>
1 0 12
2 1 2
[[3]]$gear
# A tibble: 2 × 2
gear n
<dbl> <int>
1 3 12
2 5 2
Or another option is to use nest
and store it in a single data as list
column
out <- mtcars %>%
select(cyl, all_of(count_vars)) %>%
nest(data = all_of(count_vars)) %>%
mutate(data = map(data, ~ .x %>%
split.default(names(.x)) %>%
map(~ .x %>%
count(across(everything())))))
-output
> out
# A tibble: 3 × 2
cyl data
<dbl> <list>
1 6 <named list [3]>
2 4 <named list [3]>
3 8 <named list [3]>
> out$data[[1]]
$am
# A tibble: 2 × 2
am n
<dbl> <int>
1 0 4
2 1 3
$gear
# A tibble: 3 × 2
gear n
<dbl> <int>
1 3 2
2 4 4
3 5 1
$vs
# A tibble: 2 × 2
vs n
<dbl> <int>
1 0 3
2 1 4
CodePudding user response:
Does this dplyr approach produce the output you're looking for?
mtcars %>%
group_by(cyl) %>%
count(vs, am, gear)
If not, can you please show an example of what you hope to see?
Edit:
mtcars %>%
group_by(cyl) %>%
add_tally(vs, name = 'vs') %>%
add_tally(am, name = 'am') %>%
add_tally(gear, name = 'gear') %>%
select(cyl, vs, am, gear) %>%
distinct()