Home > database >  Python - Creating Manager Hierarchy Columns with Employee Names and Manager Email columns
Python - Creating Manager Hierarchy Columns with Employee Names and Manager Email columns

Time:10-27

Am trying to create manager hierarchy columns using just Employee ID and Manager Email Address columns. The Name column has unique values while there are duplicates in the Mgr_Email column (ie each employee has one manager but a manager could have multiple reports in their organisation).

The data looks like this

Name         Mgr_Email          

Sally Po     [email protected]
Sean Sea     [email protected]   
Jacob Hin    [email protected] 
Tim Buick    [email protected]
Kris Olt     [email protected]
Cindy Myers  [email protected]

and the desired outcome is this. There could be many levels of manager hierarchy and not just 4 hierarchy columns shown in the example.

Name         Mgr_Email          Mgr_Lvl_01           Mgr_lvl_02      Mgr_lvl_03       Mgr_lvl_04

Sally Po     [email protected]  [email protected]  
Sean Sea     [email protected]    [email protected]  [email protected]
Jacob Hin    [email protected]   [email protected]  [email protected] [email protected] [email protected] 
Tim Buick    [email protected]    [email protected]  [email protected] [email protected]
Kris Olt     [email protected] [email protected]  [email protected] [email protected]
Cindy Myers  [email protected]    [email protected]  [email protected]

I have tried this but it does not work

i=1
df['Level 0'] = df['Manager Email Address']

while df.notna().sum().ne(1).all():
    df[f'Mgr_Lvl {i}'] = df[f'Mgr_Lvl {i-1}'].map(df.set_index('Name')['Mgr_Email'])
    i =1

df = df.drop('Level 0',axis=1)
df['Mgr_Lvl_01'] = df.loc[:,f'Mgr_Level {i-1}'].ffill().bfill()

Appreciate any help I could get, thank you.

CodePudding user response:

You cannot solve this easily with enter image description here

A useful tool is networkx:

# make email address from Name
# (best would be to already have an identifier to map names)
df['Email'] = df['Name'].str.lower().str.replace(r'(\w ) (\w )', r'\1.\[email protected]', regex=True)

import networkx as nx

# create graph
G = nx.from_pandas_edgelist(df, source='Mgr_Email', target='Email',
                            create_using=nx.DiGraph)

# find roots (= top managers)
roots = [n for n,d in G.in_degree() if d==0]
# ['[email protected]']

# for each employee, find the hierarchy 
df2 = (pd.DataFrame([next((p for root in roots for p in nx.all_simple_paths(G, root, node)), [])[:-1]
                     for node in df['Email']], index=df.index)
         .rename(columns=lambda x: f'Mgr_Lvl_{x 1:02d}')
      )

# join to original DataFrame
out = df.drop(columns='Email').join(df2)

output:

          Name            Mgr_Email          Mgr_Lvl_01        Mgr_Lvl_02           Mgr_Lvl_03         Mgr_Lvl_04
0     Sally Po   [email protected]  [email protected]              None                 None               None
1     Sean Sea     [email protected]  [email protected]  [email protected]                 None               None
2    Jacob Hin    [email protected]  [email protected]  [email protected]     [email protected]  [email protected]
3    Tim Buick     [email protected]  [email protected]  [email protected]     [email protected]               None
4     Kris Olt  [email protected]  [email protected]  [email protected]  [email protected]               None
5  Cindy Myers     [email protected]  [email protected]  [email protected]                 None               None
  • Related