Keep rows according to condition in Pandas-CodePudding

I am looking for a code to find rows that matches a condition and keep those rows.

In the image example, I wish to keep all the apples with amt1 => 5 and amt2 < 5. I also want to keep the bananas with amt1 => 1 and amt2 < 5 (highlighted red in image). There are many other fruits in the list that I have to filter for (maybe about 10 fruits).

image example

Currently, I am filtering it individually (ie. creating a dataframe that filters out the red and small apples and another dataframe that filters out the green and big bananas and using concat to join the dataframes together afterwards). However, this process takes a long time to run because the dataset is huge. I am looking for a faster way (like filtering it in the dataframe itself without having to create a new dataframes). I also have to use column index instead of column names as the column name changes according to the date.

Hopefully what I said makes sense. Would appreciate any help!

CodePudding user response：

I am not quite sure I understand your requirements because I don't understand how the conditions for the rows to keep are formulated.

One thing you can use to combine multiple criteria for selecting data is the query method of the dataframe:

import pandas as pd

pd.DataFrame([
    ['Apple', 5, 1],
    ['Apple', 4, 2], 
    ['Orange', 3, 3], 
    ['Banana', 2, 4], 
    ['Banana', 1, 5]], 
    columns=['Fruits', 'Amt1', 'Amt2'])

df.query('(Fruits == "Apple" & (Amt1 >= 5 & Amt2 < 5)) | (Fruits == "Banana" & (Amt1 >= 1 & Amt2 < 5))')

CodePudding user response：

You might use filter combined with itertuples following way

import pandas as pd
df = pd.DataFrame({"x":[1,2,3,4,5],"y":[10,20,30,40,50]})
def keep(row):
    return row[0] >= 2 and row[1] <= 40
df_filtered = pd.DataFrame(filter(keep,df.itertuples())).set_index("Index")
print(df_filtered)

gives output

       x   y
Index       
2      3  30
3      4  40
4      5  50

Explanation: keep is function which should return True for rows to keep False for rows to jettison. .itertuples() provides iterable of tuples, which are feed to filter which select records where keep evaluates to True, these selected rows are used to create new DataFrame. After that is done I set index so Index is corresponding to original DataFrame. Depending on your use case you might elect to not set index.