I have a dataframe, I want to choose the rows which their average value for column val1 until val4 is greater than 1 and the number of zeros for the row is not greater than 1. For example, in the following df:
id val0 val1 val2 val3 val4
0 100 5 0 0 0 0
1 200 50 -10 10 -10 10
2 300 3 0 1 1 1
3 400 4 0 3 2 5
4 500 5 3 7 0 9
5 800 3 1 0 0 6
data = {'id': [100, 200, 300, 400, 500, 800],
'val0': [5, 50, 3, 4, 5, 3],
'val1': [0, -10, 0, 0, 3, 1],
'val2': [0, 10, 1, 3, 7, 0],
'val3': [0, -10, 1, 2, 0, 0],
'val4': [0, 10, 1, 5, 9, 6]}
df = pd.DataFrame(data)
row 0: has 4 zeros, so we don't consider this row.
row 1: the average from val1-val4 is 0, we don't consider that.
row 2: the average from val1-val4 is 0.75, we don't consider that.
row 3: we consider that. Since two conditions are satisfied.
row 4: we consider that. Since two conditions are satisfied.
row 5: we have 2 zeros, so we don't consider that. The desired output is:
id val0 val1 val2 val3 val4
3 400 4 0 3 2 5
4 500 5 3 7 0 9
CodePudding user response:
IUUC, you can use df.filter(regex='val[1-4]')
to filter the columns that contians val1-4
and use mean(axis=1)
and sum(axis=1)
to operate on columns.
val = df.filter(regex='val[1-4]')
df = df[val.mean(axis=1).gt(1) & df.eq(0).sum(axis=1).le(1)]
print(df)
id val0 val1 val2 val3 val4
3 400 4 0 3 2 5
4 500 5 3 7 0 9
CodePudding user response:
df[df[['val1', 'val2', 'val3', 'val4']].mean(axis=1).gt(1) & df.eq(0).sum(axis=1).le(1)]
Output:
id val0 val1 val2 val3 val4
3 400 4 0 3 2 5
4 500 5 3 7 0 9