Have a table of employees (key emp id) with one column being the boss (same key format). I was able to build out the hierarchy by doing repeated joins (took 10 for my real organization), but now I need to add a column for:
- employee_count (count of rows with the boss_id = emp_id of current row)
- empire_count (count of rows with the hierarchy starting the same way)
I'm most comfortable in the Tidyverse, here's some simplified fake data of what I have so far as example:
library(tidyverse)
employees = tibble(
emp_id = c(1,2,3,4,5,6,7),
emp_name = c('BigBoss','MedBoss','MedBoss2','Emp1','Emp2','Emp3','Emp4'),
boss_name = c('','BigBoss','BigBoss','MedBoss','MedBoss','MedBoss2','MedBoss2'),
hierarchy = c('','BigBoss','BigBoss','BigBoss>MedBoss','BigBoss>MedBoss','BigBoss>MedBoss2','BigBoss>MedBoss2')
)
Which looks like this:
# A tibble: 7 × 4
emp_id emp_name boss_name hierarchy
<dbl> <chr> <chr> <chr>
1 1 BigBoss "" ""
2 2 MedBoss "Bigboss" "BigBoss"
3 3 MedBoss2 "BigBoss" "BigBoss"
4 4 Emp1 "MedBoss" "BigBoss>MedBoss"
5 5 Emp2 "MedBoss" "BigBoss>MedBoss"
6 6 Emp3 "MedBoss2" "BigBoss>MedBoss2"
7 7 Emp4 "MedBoss2" "BigBoss>MedBoss2"
As far as what I'm looking for, employee_count
should be 2 for each of the MedBosses and BigBoss, and then the empire_count
for BigBoss would be 6.
For the employee_count
piece, I could separately do a:
data %>% group_by(boss_name) %>% summarize(employee_count=n(emp_id))
and then join it back, but then the hierarchy wouldn't work the same way... I think the answer is some map function from purrr or creating a function, and Vectorize()'ing it and calling within a mutate, but that hasn't worked for me.
This is as close as I can get...
# Function to get the count of employees for a boss_name
get_employee_count = function(table,bossname) table %>% filter(boss_name==bossname) %>% nrow()
# This call works (returns 2)
get_employee_count(employees,'BigBoss')
# Try to add the count in via a mutate (returns 7 for each)
employees %>%
mutate(employee_count=get_employee_count(.,boss_name))
If I can get that to work, I think I could figure out the harder piece as I could do it also as a function.
CodePudding user response:
After a lot of trial and error, I can do it if I hard-code the dataframe name, which is close enough for me.
# This is the solution
employees %>%
mutate(employee_count = map_int(emp_name,function(name) employees %>% filter(boss_name==name) %>% nrow())) %>%
mutate(empire_count = map_int(emp_name,function(name) employees %>% filter(str_detect(hierarchy,name)) %>% nrow()))
CodePudding user response:
Definitely a graph/tree problem. For employee count, just find all the adjacent nodes. For empire_count, enumerate all the simple paths and count the unique nodes. All easy enough with igraph
library(igraph)
empTree <- graph_from_data_frame(
employees |>
filter(boss_name != "") |>
select(from = emp_name, to = boss_name)
)
Count_Empire <- function(node) {
paths <- all_simple_paths(empTree, node, mode = "in")
if(length(paths) == 0) return(0)
ct <- paths |>
map(~as.vector(.x)) |>
reduce(c) |>
unique() |>
length()
# minus 1 for self
ct - 1
}
Count_Employees <- function(node) {
adjacent_vertices(empTree, node, mode = "in")[[1]] |>
as.vector() |>
length()
}
employees |>
mutate(
employee_count = map_dbl(emp_name, Count_Employees),
empire_count = map_dbl(emp_name, Count_Empire)
)