My data table df
shows employees, their employers, and whether they are hybrid workers. It looks like this:
name firm hybrid
Mike A 1
Kate A 1
Dan C 0
Jess B 0
Mick C 1
Nate C 0
Mike A 1
I want to have a table that shows the number of employees and number of hybrid employees by firm:
firm total hybrid
A 3 3
B 1 0
C 3 1
What I tried so far is group_by (firm) %>% summarize (total = length(unique(name)), hybrid = length(unique(df[hybrid == 1])))
However, this approach gave me one observation per employee instead of firm.
Does anyone know a better way to do this?
CodePudding user response:
You need to use name[hybrid == 1]
instead of df[hybrid == 1]
. Note that dplyr
has a wrapper function for length(unique(x))
, n_distinct
.
library(dplyr)
df %>%
group_by(firm) %>%
summarise(total = n_distinct(name),
hybrid = n_distinct(name[hybrid == 1]))
output
# A tibble: 3 × 3
firm total hybrid
<chr> <int> <int>
1 A 2 2
2 B 1 0
3 C 3 1
CodePudding user response:
Using data.table
library(data.table)
setDT(df1)[, .(total = uniqueN(name), hybrid = uniqueN(name[hybrid == 1])), firm]
firm total hybrid
<char> <int> <int>
1: A 2 2
2: C 3 1
3: B 1 0