Home > database >  How do I sum unique observations (var levels) for each level of another variable?
How do I sum unique observations (var levels) for each level of another variable?

Time:07-18

I have a dataset with 20000 observations. I converted var 'worker' into a factor to find out how many workers there were (329 levels). Var org (the organisation workers belong to) is also a factor. The n column corresponds to the number of projects each worker completed with their org.

I need to know how many workers each org has (NOT projects/observations). How do I go about it?

    > data %>% group_by(data$org, data$worker) %>% count(data$org)
# A tibble: 329 x 3
# Groups:   data$org, data$worker[329]
   `data$org`         `data$worker`                  n
   <fct>            <fct>                          <int>
 1 ccdi-3           0ba0867be1b6c27645801da785cd00    19
 2 ccdi-3           0bee59c83cf1d6f269911722936d91   134
 3 ccdi-3           0d3280866efc2d9634f448cafcb520    79
 4 ccdi-3           12b7978ff64eab8022d858792c2235   135
 5 ccdi-3           19f3bf29cddde4018769536ff96f2b    85
 6 ccdi-3           1bd6a0f3b6c14a319923896c60ecc6   123
 7 ccdi-3           20761d73cccc97d45a143b1f776e90    32
 8 ccdi-3           2231d8b26e563b373c474003ba2f2c   124
 9 ccdi-3           2530d02cb21de9541bf498410a9926    45
10 ccdi-3           2ac3cb439ace387a397398484297bd     3
# ... with 319 more rows

CodePudding user response:

You could group by org and summarise with n_distinct:

library(tidyverse)

# Sample data
df <- tribble(
  ~org, ~worker, ~n,
  "ccdi-3", "0ba0867be1b6c27645801da785cd00", 19,
  "ccdi-3", "0bee59c83cf1d6f269911722936d91", 134,
  "ccdi-3", "0d3280866efc2d9634f448cafcb520", 79,
  "ccdi-3", "12b7978ff64eab8022d858792c2235", 135,
  "ccdi-3", "19f3bf29cddde4018769536ff96f2b", 85,
  "ccdi-3", "1bd6a0f3b6c14a319923896c60ecc6", 123
)

df |> 
  group_by(org) |> 
  summarise(n_workers = n_distinct(worker)) |> 
  ungroup()

#> # A tibble: 1 × 2
#>   org    n_workers
#>   <chr>      <int>
#> 1 ccdi-3         6

Created on 2022-07-18 by the reprex package (v2.0.1)

  • Related