I'm trying to get a count of the number of students of each gender by class, but I also want the number of students identifying as each gender overall. The desired output is one object that has the overall and by class gender breakdowns.
I have working code (below) that does this, but I wasn't sure if there was a more streamlined way to accomplish this task without creating an intermediary object and joining them together.
library(dplyr)
#Sample dataset
test_data <- tibble(id = c(1, 1, 2, 2, 2, 3, 3, 3),
class = c("h", "h", "m", "h", "s", "m", "h", "h"),
gender = c("m", "m", "f", "f", "f", "m", "m", "m"))
#My code to accomplish this task now (produces desired output but curious if there's a more efficient method)
gender_by_class <- test_data %>%
distinct(id, class, gender) %>%
group_by(class) %>%
count(gender) %>%
ungroup()
gender_overall <- test_data %>%
distinct(id, gender) %>%
count(gender) %>%
mutate(class = "overall") %>%
full_join(gender_by_class)
CodePudding user response:
Similar to @Quinten's approach, but with n_distinct
:
library(dplyr)
test_data %>%
group_by(gender) %>%
summarise(n = n_distinct(id), class = 'overall') %>%
bind_rows(
test_data %>%
group_by(class, gender) %>%
summarise(n = n_distinct(id))
)
Output:
# A tibble: 7 × 3
gender n class
<chr> <int> <chr>
1 f 1 overall
2 m 2 overall
3 f 1 h
4 m 2 h
5 f 1 m
6 m 1 m
7 f 1 s
CodePudding user response:
You could use bind_rows
to have it in one pipe like this:
library(dplyr)
test_data %>%
distinct(id, class, gender) %>%
group_by(class) %>%
count(gender) %>%
ungroup() %>%
bind_rows(., test_data %>%
distinct(id, gender) %>%
count(gender) %>%
mutate(class = "overall"))
#> # A tibble: 7 × 3
#> class gender n
#> <chr> <chr> <int>
#> 1 h f 1
#> 2 h m 2
#> 3 m f 1
#> 4 m m 1
#> 5 s f 1
#> 6 overall f 1
#> 7 overall m 2
Created on 2023-01-29 with reprex v2.0.2
Thanks to @stefan, an even better option:
library(dplyr)
test_data %>%
distinct(id, class, gender) %>%
count(class, gender) %>%
bind_rows(., test_data %>%
distinct(id, gender) %>%
count(class = "overall", gender))
#> # A tibble: 7 × 3
#> class gender n
#> <chr> <chr> <int>
#> 1 h f 1
#> 2 h m 2
#> 3 m f 1
#> 4 m m 1
#> 5 s f 1
#> 6 overall f 1
#> 7 overall m 2
Created on 2023-01-29 with reprex v2.0.2