Home > Software design >  Choose the rows with special conditions in dataframe
Choose the rows with special conditions in dataframe

Time:05-15

I have a dataframe, I want to choose the rows which their average value for column val1 until val4 is greater than 1 and the number of zeros for the row is not greater than 1. For example, in the following df:

    id  val0  val1  val2  val3  val4
0  100     5     0     0     0     0
1  200    50   -10    10   -10    10
2  300     3     0     1     1     1
3  400     4     0     3     2     5
4  500     5     3     7     0     9
5  800     3     1     0     0     6

data = {'id': [100, 200, 300, 400, 500, 800],
 'val0': [5, 50, 3, 4, 5, 3],
 'val1': [0, -10, 0, 0, 3, 1],
 'val2': [0, 10, 1, 3, 7, 0],
 'val3': [0, -10, 1, 2, 0, 0],
 'val4': [0, 10, 1, 5, 9, 6]}
df = pd.DataFrame(data)

row 0: has 4 zeros, so we don't consider this row.

row 1: the average from val1-val4 is 0, we don't consider that.

row 2: the average from val1-val4 is 0.75, we don't consider that.

row 3: we consider that. Since two conditions are satisfied.

row 4: we consider that. Since two conditions are satisfied.

row 5: we have 2 zeros, so we don't consider that. The desired output is:

    id  val0  val1  val2  val3  val4
3  400     4     0     3     2     5
4  500     5     3     7     0     9

CodePudding user response:

IUUC, you can use df.filter(regex='val[1-4]') to filter the columns that contians val1-4 and use mean(axis=1) and sum(axis=1) to operate on columns.

val = df.filter(regex='val[1-4]')
df = df[val.mean(axis=1).gt(1) & df.eq(0).sum(axis=1).le(1)]
print(df)

    id  val0  val1  val2  val3  val4
3  400     4     0     3     2     5
4  500     5     3     7     0     9

CodePudding user response:

df[df[['val1', 'val2', 'val3', 'val4']].mean(axis=1).gt(1) & df.eq(0).sum(axis=1).le(1)]

Output:

    id  val0  val1  val2  val3  val4
3  400     4     0     3     2     5
4  500     5     3     7     0     9
  • Related