I need some help counting observations meeting certain criteria by group. I first want the number of employees by location as a column. Then I would like to retrieve the number of employees that have worked more than 40 hours (by location) and summarize that into a column. I assume there is an easy way to do it with dplyr or base R but I'm stumped. My data is below.
name hours_worked location
Bob 55 IL
Nick 25 IL
Sally 30 IL
Patricia 50 WI
Tim 35 WI
Liz 42 OH
Brad 60 OH
Sam 48 OH
Ideal output would be something like:
location headcount over_40
IL 3 1
WI 2 1
OH 3 3
CodePudding user response:
We can do a group by operation - grouped by 'location' get the number of rows (n()
) for headcount and the sum
of logical vector to get the count of 'over_40'
library(dplyr)
df1 %>%
group_by(location) %>%
summarise(headcount = n(), over_40 = sum(hours_worked > 40))
-output
# A tibble: 3 x 3
location headcount over_40
<chr> <int> <int>
1 IL 3 1
2 OH 3 3
3 WI 2 1
data
df1 <- structure(list(name = c("Bob", "Nick", "Sally", "Patricia", "Tim",
"Liz", "Brad", "Sam"), hours_worked = c(55L, 25L, 30L, 50L, 35L,
42L, 60L, 48L), location = c("IL", "IL", "IL", "WI", "WI", "OH",
"OH", "OH")), class = "data.frame", row.names = c(NA, -8L))