Consider this data frame.
question <- data.frame("Product" = c("P001", "P001", "P001", "P002", "P002", "P002"),
"Activity" = c("sawing", "planning", "opening", "sawing", "planning", "opening"),
"Employee" = c("Tom", "Bert", "Louisa", "Bert", "Louisa", "Louisa"))
Product | Activity | Employee |
---|---|---|
P001 | sawing | Tom |
P001 | planning | Bert |
P001 | opening | Louisa |
P002 | sawing | Bert |
P002 | planning | Louisa |
P002 | opening | Louisa |
I would like to do 3 summarisations in one new table:
- Group by activity and aggregate the number of activities;
- Group by activity and summarise the number of employees that can do a specific process;
- Group by product and summarise the number of employees that do a specific process, averaged per product.
It's the last summarisation that I can't figure out, since I need a different "group_by" for this. What I have so far is:
days <- question %>%
group_by(Activity) %>%
summarise("Number of activities" = n(),
"Number of employees" = length(unique(Employee)))
So for example, as a result for the last summarisation, I would like to get:
Activity | Number of activities | Number of employees | Average number of employees per activity per product |
---|---|---|---|
opening | 2 | 1 | 1 |
planning | 2 | 2 | 1 |
sawing | 2 | 2 | 1 |
How do I do the last step/column?
CodePudding user response:
One option is to calculate the number of employees per product separately, and join this to the original dataset before calculating the final summary:
summary1 <- question %>%
group_by(Activity, Product) %>%
summarize(n_employees_per_prod_act = n(), .groups = "drop")
question %>%
left_join(summary1, by = c("Activity", "Product")) %>%
group_by(Activity) %>%
summarise("Number of activities" = n(),
"Number of employees" = length(unique(Employee)),
"Average number of employees per activity per product" = mean(n_employees_per_prod_act))
# A tibble: 3 x 4
Activity `Number of activ~ `Number of emplo~ `Average number of~
<chr> <int> <int> <dbl>
1 opening 2 1 1
2 planning 2 2 1
3 sawing 2 2 1
CodePudding user response:
Here is my attempt. Please notice that I changed Number of activties
to Number of products
in the expected output.
question %>%
nest_by(Activity, Product) |>
transmute(uempl = unique(data$Employee),
uempp = length(uempl)) |>
group_by(Activity) |>
summarize("Number of employees" = length(unique(uempl)),
"Average number of employees per activity per product" = mean(uempp),
"Number of products" = n())
# A tibble: 3 × 4
Activity `Number of employees` `Average number of employees…` `Number of pro…`
<chr> <int> <dbl> <int>
1 opening 1 1 2
2 planning 2 1 2
3 sawing 2 1 2