Home > Blockchain >  Check circular reference within DataFrame (Python)
Check circular reference within DataFrame (Python)

Time:10-08

I'm working on a set of employee data where all employees report to a manager. In the Data Frame, all employees are shown as an ID and each ID has a parent ID (the manager's ID). Is there a way to check if any employee's reporting line direct back to themselves?

Example data frame:

pd.DataFrame({"id":[111,112,113],"parentid":[112,113,111]})

In this example employee 111 reports to 112, 112 reports to 113, 113 reports to 111. The line becomes a circular reference. Is there a way to check for this kind of circular reference?

Thank you very much!

CodePudding user response:

This is a perfect use case for circular references pandas graph

Create a directed graph and use simple_cycles to identify the circular references

import networkx as nx

G = nx.from_pandas_edgelist(df, source='parentid', target='id',
                            create_using=nx.DiGraph)

list(nx.simple_cycles(G))

output: [[112, 111, 113]]

If you want to label the circular nodes, you can further use:

circular = {n for l in nx.simple_cycles(G) for n in l}

df['circular'] = df['id'].isin(circular)

output (on a more complex example):

    id  parentid  circular
0  111       112      True
1  112       113      True
2  113       111      True
3  210       211     False
4  211       212     False
  • Related