So I have been working on this problem for a little while now. I asked this question before, but in a context of only using python. I got great answer from trincot in this thread(also the problem is described in detail here):
I load the data into pandas dataframe and delete the connections where Parent and Child are the same:
data = pd.read_csv('data.txt', sep=" ", header=0)
data = data[data['child'] != data['Parent]]
then I create a DiGraph from my dataframe:
G = nx.from_pandas_edgelist(data, source = 'Parent', target = 'child', create_using = nx.DiGraph())
And I try to sift for a roots and leaves:
roots = (v for v, d in G.in_degree() if d==0)
leaves = (v for v, d in G.out_degree() if d ==0)
aaaaaand I'm stuck. I have two ideas. Take every possible path from leaves to roots and then only print leaf -> node connection starting from leaf to root or take every node, check if it exists in a path from leaves to roots, take It's leaf and print leaf -> node. The output for the example data will be like this:
ANALYTICAL_BALANCE BFG_DEPOSIT
ANALYTICAL_BALANCE BFG_2056
ANALYTICAL_BALANCE FFD_15
CUSTOM_DATA AND_11
CUSTOM_DATA DICT_DEAL
CUSTOM_DATA BFG_2056
CUSTOM_DATA FFD_15
CUSTOMER_DETAIL BALANCE
CUSTOMER_DETAIL BFG_16
CUSTOMER_DETAIL STAT_HIST
If you maybe have any other ideas on how I can solve this or what library may be better for this feel free to write. Any help would be greatly appreciated!
CodePudding user response:
Ok so I guess that the answer to my problem was really simple. The whole code looks like this:
data = pd.read_csv('data.txt', sep=" ", header=0)
data = data[data['child'] != data['Parent']]
G = nx.from_pandas_edgelist(data, source = 'Parent', target = 'child', create_using = nx.DiGraph())
leaves = list((v for v, d in G.in_degree() if d==0))
nodes = G.nodes()
for leaf in leaves:
for node in nodes:
if nx.has_path(G,leaf, node) and node not in leaves:
print(leaf, node)
Idea behind this code is really simple. I just take every leaf from a forest and check if there exists a path from leaf to every node. Ofcourse the leafs are also nodes so when I see one I pass it. It may be a little naive, but works great.