I have a dataframe like so:
J1 J2 J3 J4
0 551 5 552 553
1 551 554 2 5
2 2 554 555 556
3 7 6 557 558
4 559 9 560 561
The goal is to determine which rows are connected to one another. For example: rows 0, 1, and 2 have a matching value that connects it to the next (551 in row 0 and 1, and 554 in row 1 and 2). Once that is determined, I need to isolate those rows into its own separate chunk of data. It should work for any row in the dataframe, not necessarily just the next row. I can't quite figure out how to do this. Any ideas?
CodePudding user response:
As you dataset is small, you can use numpy broadcasting to perform all comparisons:
The code below gives you the number of connected rows (I added an extra connected row for the example):
a = df.values
b = (a==a[:,None]).sum(2)
np.fill_diagonal(b, 0)
df['connected'] = b.sum(0)
output:
0 1 2 3 connected
0 551 5 552 553 1
1 551 554 2 5 3
2 2 554 555 556 1
3 7 6 557 558 0
4 559 9 560 561 0
5 500 0 2 0 1
CodePudding user response:
Finding the connected successive rows:
You can compare with the next row using shift
any
:
mask = df.eq(df.shift(-1)).any(1)
df['connected'] = mask|mask.shift()
output:
J1 J2 J3 J4 connected
0 551 5 552 553 True
1 551 554 2 5 True
2 2 554 555 556 True
3 7 6 557 558 False
4 559 9 560 561 False