Home > Back-end >  Pandas: Determine which rows are connected by a matching value in the next
Pandas: Determine which rows are connected by a matching value in the next

Time:10-20

I have a dataframe like so:

      J1    J2    J3    J4
0     551    5    552   553
1     551   554    2     5
2      2    554   555   556
3      7     6    557   558
4     559    9    560   561

The goal is to determine which rows are connected to one another. For example: rows 0, 1, and 2 have a matching value that connects it to the next (551 in row 0 and 1, and 554 in row 1 and 2). Once that is determined, I need to isolate those rows into its own separate chunk of data. It should work for any row in the dataframe, not necessarily just the next row. I can't quite figure out how to do this. Any ideas?

CodePudding user response:

As you dataset is small, you can use numpy broadcasting to perform all comparisons:

The code below gives you the number of connected rows (I added an extra connected row for the example):

a = df.values

b = (a==a[:,None]).sum(2)
np.fill_diagonal(b, 0)

df['connected'] = b.sum(0)

output:

     0    1    2    3  connected
0  551    5  552  553          1
1  551  554    2    5          3
2    2  554  555  556          1
3    7    6  557  558          0
4  559    9  560  561          0
5  500    0    2    0          1

CodePudding user response:

Finding the connected successive rows:

You can compare with the next row using shift any:

mask = df.eq(df.shift(-1)).any(1)
df['connected'] = mask|mask.shift()

output:

    J1   J2   J3   J4  connected
0  551    5  552  553       True
1  551  554    2    5       True
2    2  554  555  556       True
3    7    6  557  558      False
4  559    9  560  561      False
  • Related