Home > Software engineering >  Identify the parent and children value in the dataframe
Identify the parent and children value in the dataframe

Time:09-29

I spend almost half of my day trying to solve this...

I want to identify the value in parent and child columns and change it to rows.

The value has a tree structure in that the parent node becomes the child node, and the child node becomes the parent node at the next step.

My sample data looks like.

  |  Parent   |  Child |
--------------------------
0 |   a            b
1 |   b            c
2 |   b            d
3 |   c            e
4 |   c            f
5 |   f            g
6 |   d            h

and I want to change this like,

  |  Col1  |  Col2  |  Col3  |  Col4  |  Col5  |  Col6  |
----------------------------------------------------------
0 |   a    |   b    |   c    |   f    |    g   |   nan  |
1 |   a    |   b    |   c    |   e    |   nan  |   nan  |
2 |   a    |   b    |   d    |   h    |   nan  |   nan  |

I have tried doing the loop for searching the next items, but it does not work.

Any help would be appreciated.

CodePudding user response:

You can approach this using a graph and graph

Create all edges, find the roots and leafs and compute the paths with all_simple_paths:

import networkx as nx

G = nx.from_pandas_edgelist(df, source='Parent', target='Child',
                            create_using=nx.DiGraph)

roots = [n for n,d in G.in_degree() if d==0] 
leafs = [n for n,d in G.out_degree() if d==0] 

df2 = pd.DataFrame([l for r in roots for l in nx.all_simple_paths(G, r, leafs)])

output:

   0  1  2  3     4
0  a  b  c  e  None
1  a  b  c  f     g
2  a  b  d  h  None
  • Related