Home > database >  Build org hierarchy
Build org hierarchy

Time:08-31

having a dataset in a DataFrame similar to this one:

name    manager    ...    sale
john    N/A               100
anna    john              120
ben     john              110
cecilia ben               130
david   clark             150
...
zain    cecilia           450

I'd like to create a flat table with all the hierarchies, by specifying a root manager

e.g. buildHierarchy('john', 3) would give me a table 3 levels deep like the one below

root      manager1    manager2    manager3    sale
john                                          110
john      anna                                120
john      ben                                 110 
john      ben         cecilia                 130
john      ben         cecilia     zain        450

CodePudding user response:

This is a graph problem.

Here is your graph:

hierarchy

You can use networkx:

import networkx as nx

def buildHierarchy(root, n):
    G = nx.from_pandas_edgelist(df, source='manager', target='name',
                                create_using=nx.DiGraph)
    G.remove_node(np.nan)

    paths = [nx.shortest_path(G, root, name)
             for i in range(n 1)
             for name in nx.descendants_at_distance(G, source=root, distance=i)
             ]

    df2 = pd.DataFrame(paths).iloc[:, 1:].fillna('').add_prefix('manager')
    df2.insert(0, 'root', root)
    
    df2['sales'] = (df.set_index('name')
                      .loc[[x[-1] for x in paths], 'sale']
                      .to_numpy()
                   )
    
    return df2

out = buildHierarchy('john', 3)

output:

   root manager1 manager2 manager3  sales
0  john                               100
1  john      ben                      110
2  john     anna                      120
3  john      ben  cecilia             130
4  john      ben  cecilia     zain    450

other example:

buildHierarchy('ben', 1)

  root manager1  sales
0  ben             110
1  ben  cecilia    130
  • Related