Home > database >  how to filter a dataframe based on a pattern?
how to filter a dataframe based on a pattern?

Time:10-18

I have this dataframe

  col1   col2  col3  
0   a      b     1
1   b      c     2
2   c      d     3
3   d      a     4 
4   k      g     5  
5   w      x     6
6   y      z     7
7   z      w     8
8   r      w     9

I want an output where I can only have a "cycle" pattern in the dataframe

Expected output:

   col1   col2  col3  
0   a      b     1
1   b      c     2
2   c      d     3
3   d      a     4  
5   w      x     6
6   y      z     7
7   z      w     8

Is what I'm asking possible?

CodePudding user response:

This looks like a graph problem, which you can solve using input graph

import networkx as nx

G = nx.from_pandas_edgelist(df, source='col1', target='col2',
                            create_using=nx.DiGraph)

nodes = {n for l in nx.simple_cycles(G) for n in l}
# {'a', 'b', 'c', 'd', 'w', 'x', 'z'}

out = df.loc[df['col1'].isin(nodes) & df['col2'].isin(nodes)]
# or
# out = df[df[['col1', 'col2']].isin(nodes).all(axis=1)]

print(out)

output:

  col1 col2  col3
0    a    b     1
1    b    c     2
2    c    d     3
3    d    a     4
5    w    x     6
6    x    z     7
7    z    w     8

graph of the output:

output graph

  • Related