Home > Blockchain >  Group by and summarize conditionally
Group by and summarize conditionally

Time:09-29

My data table df shows employees, their employers, and whether they are hybrid workers. It looks like this:

name firm hybrid
Mike A    1
Kate A    1 
Dan  C    0
Jess B    0
Mick C    1
Nate C    0 
Mike A    1

I want to have a table that shows the number of employees and number of hybrid employees by firm:

firm total hybrid
A    3     3
B    1     0
C    3     1

What I tried so far is group_by (firm) %>% summarize (total = length(unique(name)), hybrid = length(unique(df[hybrid == 1]))) However, this approach gave me one observation per employee instead of firm. Does anyone know a better way to do this?

CodePudding user response:

You need to use name[hybrid == 1] instead of df[hybrid == 1]. Note that dplyr has a wrapper function for length(unique(x)), n_distinct.

library(dplyr)
df %>% 
  group_by(firm) %>% 
  summarise(total = n_distinct(name),
            hybrid = n_distinct(name[hybrid == 1]))

output

# A tibble: 3 × 3
  firm  total hybrid
  <chr> <int>  <int>
1 A         2      2
2 B         1      0
3 C         3      1

CodePudding user response:

Using data.table

library(data.table)
 setDT(df1)[, .(total = uniqueN(name), hybrid = uniqueN(name[hybrid == 1])), firm]
     firm total hybrid
   <char> <int>  <int>
1:      A     2      2
2:      C     3      1
3:      B     1      0
  • Related