Home > Software design >  How to get connection from every leaf to every node in all paths from leaf to root in Networkx
How to get connection from every leaf to every node in all paths from leaf to root in Networkx

Time:09-28

So I have been working on this problem for a little while now. I asked this question before, but in a context of only using python. I got great answer from trincot in this thread(also the problem is described in detail here):
Visual Example

I load the data into pandas dataframe and delete the connections where Parent and Child are the same:

data = pd.read_csv('data.txt', sep=" ", header=0)
data = data[data['child'] != data['Parent]]

then I create a DiGraph from my dataframe:

G = nx.from_pandas_edgelist(data, source = 'Parent', target = 'child', create_using = nx.DiGraph())

And I try to sift for a roots and leaves:

roots = (v for v, d in G.in_degree() if d==0)
leaves = (v for v, d in G.out_degree() if d ==0)

aaaaaand I'm stuck. I have two ideas. Take every possible path from leaves to roots and then only print leaf -> node connection starting from leaf to root or take every node, check if it exists in a path from leaves to roots, take It's leaf and print leaf -> node. The output for the example data will be like this:

ANALYTICAL_BALANCE BFG_DEPOSIT

ANALYTICAL_BALANCE BFG_2056
ANALYTICAL_BALANCE FFD_15
CUSTOM_DATA AND_11
CUSTOM_DATA DICT_DEAL
CUSTOM_DATA BFG_2056
CUSTOM_DATA FFD_15

CUSTOMER_DETAIL BALANCE
CUSTOMER_DETAIL BFG_16
CUSTOMER_DETAIL STAT_HIST

If you maybe have any other ideas on how I can solve this or what library may be better for this feel free to write. Any help would be greatly appreciated!

CodePudding user response:

Ok so I guess that the answer to my problem was really simple. The whole code looks like this:

data = pd.read_csv('data.txt', sep=" ", header=0)
data = data[data['child'] != data['Parent']]
G = nx.from_pandas_edgelist(data, source = 'Parent', target = 'child', create_using = nx.DiGraph())
leaves = list((v for v, d in G.in_degree() if d==0))
nodes = G.nodes()
for leaf in leaves:
    for node in nodes:
        if nx.has_path(G,leaf, node) and node not in leaves:
            print(leaf, node)

Idea behind this code is really simple. I just take every leaf from a forest and check if there exists a path from leaf to every node. Ofcourse the leafs are also nodes so when I see one I pass it. It may be a little naive, but works great.

  • Related