having a dataset in a DataFrame similar to this one:
name manager ... sale
john N/A 100
anna john 120
ben john 110
cecilia ben 130
david clark 150
...
zain cecilia 450
I'd like to create a flat table with all the hierarchies, by specifying a root manager
e.g. buildHierarchy('john', 3)
would give me a table 3 levels deep like the one below
root manager1 manager2 manager3 sale
john 110
john anna 120
john ben 110
john ben cecilia 130
john ben cecilia zain 450
CodePudding user response:
This is a graph problem.
Here is your graph:
You can use networkx
:
import networkx as nx
def buildHierarchy(root, n):
G = nx.from_pandas_edgelist(df, source='manager', target='name',
create_using=nx.DiGraph)
G.remove_node(np.nan)
paths = [nx.shortest_path(G, root, name)
for i in range(n 1)
for name in nx.descendants_at_distance(G, source=root, distance=i)
]
df2 = pd.DataFrame(paths).iloc[:, 1:].fillna('').add_prefix('manager')
df2.insert(0, 'root', root)
df2['sales'] = (df.set_index('name')
.loc[[x[-1] for x in paths], 'sale']
.to_numpy()
)
return df2
out = buildHierarchy('john', 3)
output:
root manager1 manager2 manager3 sales
0 john 100
1 john ben 110
2 john anna 120
3 john ben cecilia 130
4 john ben cecilia zain 450
other example:
buildHierarchy('ben', 1)
root manager1 sales
0 ben 110
1 ben cecilia 130