I'm working on a set of employee data where all employees report to a manager. In the Data Frame, all employees are shown as an ID and each ID has a parent ID (the manager's ID). Is there a way to check if any employee's reporting line direct back to themselves?
Example data frame:
pd.DataFrame({"id":[111,112,113],"parentid":[112,113,111]})
In this example employee 111 reports to 112, 112 reports to 113, 113 reports to 111. The line becomes a circular reference. Is there a way to check for this kind of circular reference?
Thank you very much!
CodePudding user response:
This is a perfect use case for
Create a directed graph and use simple_cycles
to identify the circular references
import networkx as nx
G = nx.from_pandas_edgelist(df, source='parentid', target='id',
create_using=nx.DiGraph)
list(nx.simple_cycles(G))
output: [[112, 111, 113]]
If you want to label the circular nodes, you can further use:
circular = {n for l in nx.simple_cycles(G) for n in l}
df['circular'] = df['id'].isin(circular)
output (on a more complex example):
id parentid circular
0 111 112 True
1 112 113 True
2 113 111 True
3 210 211 False
4 211 212 False