Home > Software design >  Row selection in data frame based on condition
Row selection in data frame based on condition

Time:12-28

The dataframe below has time series data

| datetime                | mc | quality | v0 | v1 | v2 | v5 | v6 | v7   | v8 | v9 | v10 |
|-------------------------|----|---------|----|----|----|----|----|------|----|----|-----|
| 2021-12-27 14:00:56.420 | L2 | 192     | 1  | 0  | 0  | 0  | 0  | 5515 | 0  | 0  | 0   |
| 2021-12-27 14:01:25.267 | L2 | 192     | 1  | 0  | 0  | 0  | 0  | 5507 | 0  | 0  | 0   |
| 2021-12-27 14:01:54.113 | L2 | 192     | 1  | 0  | 0  | 0  | 0  | 5524 | 0  | 0  | 0   |
| 2021-12-27 14:02:22.897 | L2 | 192     | 1  | 5  | 6  | 1  | 0  | 5513 | 0  | 0  | 0   |
| 2021-12-27 14:02:51.747 | L2 | 192     | 1  | 0  | 0  | 0  | 0  | 5455 | 0  | 0  | 0   |
| 2021-12-27 14:03:51.743 | L2 | 192     | 1  | 0  | 0  | 0  | 0  | 5522 | 7  | 3  | 9   |
| 2021-12-27 14:04:51.743 | L2 | 192     | 1  | 4  | 0  | 0  | 0  | 5522 | 7  | 3  | 9   |

Requirement: I need to select rows in which either v1 or v9 is not equal '0'

| datetime                | mc | quality | v0 | v1 | v2 | v5 | v6 | v7   | v8 | v9 | v10 |
|-------------------------|----|---------|----|----|----|----|----|------|----|----|-----|
| 2021-12-27 14:02:22.897 | L2 | 192     | 1  | 5  | 6  | 1  | 0  | 5513 | 0  | 0  | 0   |
| 2021-12-27 14:03:51.743 | L2 | 192     | 1  | 0  | 0  | 0  | 0  | 5522 | 7  | 3  | 9   |
| 2021-12-27 14:04:51.743 | L2 | 192     | 1  | 4  | 0  | 0  | 0  | 5522 | 7  | 3  | 9   |

Trials:

df = df[df.v1 != 0]

If I select individually it works

df = df[(df.v1 != 0) & (df.v9 != 0)]

But if I put condition it doesn't work, I get empty data frame.

Would like to know the efficient method

CodePudding user response:

Use | for bitwise OR:

df1 = df[(df.v1 != 0) | (df.v9 != 0)]
print (df1)
                 datetime  mc  quality  v0  v1  v2  v5  v6    v7  v8  v9  v10
3 2021-12-27 14:02:22.897  L2      192   1   5   6   1   0  5513   0   0    0
5 2021-12-27 14:03:51.743  L2      192   1   0   0   0   0  5522   7   3    9
6 2021-12-27 14:04:51.743  L2      192   1   4   0   0   0  5522   7   3    9

Or test both columns with DataFrame.any for get rows if at least one True in boolean DataFrame:

df1 = df[(df[['v1','v9']] != 0).any(axis=1)]
print (df1)
                 datetime  mc  quality  v0  v1  v2  v5  v6    v7  v8  v9  v10
3 2021-12-27 14:02:22.897  L2      192   1   5   6   1   0  5513   0   0    0
5 2021-12-27 14:03:51.743  L2      192   1   0   0   0   0  5522   7   3    9
6 2021-12-27 14:04:51.743  L2      192   1   4   0   0   0  5522   7   3    9
  • Related