I have the following dataframe:
mba_p ssc_p hsc_p degree_p etest_p
0 58.80 67.00 91.00 58.00 55.0
1 66.28 79.33 78.33 77.48 86.5
2 57.80 65.00 68.00 64.00 75.0
3 59.43 56.00 52.00 52.00 66.0
4 55.50 85.80 73.60 73.30 96.8
... ... ... ... ... ...
210 74.49 80.60 82.00 77.60 91.0
211 53.62 58.00 60.00 72.00 74.0
212 69.72 67.00 67.00 73.00 59.0
213 60.23 74.00 66.00 58.00 70.0
214 60.22 62.00 58.00 53.00 89.0
I would like to filter the data based on this condition:
Extract all the rows where at least one value is greater than any of the values in this list:
[66.255, 75.7, 73.0, 72.0, 83.5]
tl;dr: the first row is a candidate since 67 and 91 are greater than more than one value in the list above (the values in the list are the 75th percentile of every column basically). Similarly, extract all such rows.
I tried using np.where() using the steps shown here but since that is filtering based only on one value and not a list, I can't figure how to extend that to this situation. A simple condition check would return rows only where all values are greater than equal I guess (unless I'm missing anything). Sorry if this sounds a little convoluted but not able to crack it yet.
CodePudding user response:
You can use a mask of dataframe to get rows with values greater than the minimum in the array
num = min(66.255, 75.7, 73.0, 72.0, 83.5)
print(df.loc[df[df > num].any(axis=1)])
Output
mba_p ssc_p hsc_p degree_p etest_p
0 58.80 67.00 91.00 58.00 55.0
1 66.28 79.33 78.33 77.48 86.5
2 57.80 65.00 68.00 64.00 75.0
4 55.50 85.80 73.60 73.30 96.8
5 74.49 80.60 82.00 77.60 91.0
6 53.62 58.00 60.00 72.00 74.0
7 69.72 67.00 67.00 73.00 59.0
8 60.23 74.00 66.00 58.00 70.0
9 60.22 62.00 58.00 53.00 89.0
CodePudding user response:
a = np.array([66.255, 75.7, 73.0, 72.0, 83.5])
df[df.apply(lambda x: (x.to_numpy() > a[:, None]).any(), axis = 1)]
mba_p ssc_p hsc_p degree_p etest_p
0 58.80 67.00 91.00 58.00 55.0
1 66.28 79.33 78.33 77.48 86.5
2 57.80 65.00 68.00 64.00 75.0
4 55.50 85.80 73.60 73.30 96.8
210 74.49 80.60 82.00 77.60 91.0
211 53.62 58.00 60.00 72.00 74.0
212 69.72 67.00 67.00 73.00 59.0
213 60.23 74.00 66.00 58.00 70.0
214 60.22 62.00 58.00 53.00 89.0