I have a dataset with 20000 observations. I converted var 'worker' into a factor to find out how many workers there were (329 levels). Var org (the organisation workers belong to) is also a factor. The n column corresponds to the number of projects each worker completed with their org.
I need to know how many workers each org has (NOT projects/observations). How do I go about it?
> data %>% group_by(data$org, data$worker) %>% count(data$org)
# A tibble: 329 x 3
# Groups: data$org, data$worker[329]
`data$org` `data$worker` n
<fct> <fct> <int>
1 ccdi-3 0ba0867be1b6c27645801da785cd00 19
2 ccdi-3 0bee59c83cf1d6f269911722936d91 134
3 ccdi-3 0d3280866efc2d9634f448cafcb520 79
4 ccdi-3 12b7978ff64eab8022d858792c2235 135
5 ccdi-3 19f3bf29cddde4018769536ff96f2b 85
6 ccdi-3 1bd6a0f3b6c14a319923896c60ecc6 123
7 ccdi-3 20761d73cccc97d45a143b1f776e90 32
8 ccdi-3 2231d8b26e563b373c474003ba2f2c 124
9 ccdi-3 2530d02cb21de9541bf498410a9926 45
10 ccdi-3 2ac3cb439ace387a397398484297bd 3
# ... with 319 more rows
CodePudding user response:
You could group by org
and summarise
with n_distinct
:
library(tidyverse)
# Sample data
df <- tribble(
~org, ~worker, ~n,
"ccdi-3", "0ba0867be1b6c27645801da785cd00", 19,
"ccdi-3", "0bee59c83cf1d6f269911722936d91", 134,
"ccdi-3", "0d3280866efc2d9634f448cafcb520", 79,
"ccdi-3", "12b7978ff64eab8022d858792c2235", 135,
"ccdi-3", "19f3bf29cddde4018769536ff96f2b", 85,
"ccdi-3", "1bd6a0f3b6c14a319923896c60ecc6", 123
)
df |>
group_by(org) |>
summarise(n_workers = n_distinct(worker)) |>
ungroup()
#> # A tibble: 1 × 2
#> org n_workers
#> <chr> <int>
#> 1 ccdi-3 6
Created on 2022-07-18 by the reprex package (v2.0.1)