How to drop rows that do not meet multiple conditions python-CodePudding

I have a large df of the following structure:

Grade   Price   Group
A   2   apple
A   10  apple
A   9   apple
A   10  apple
B   8   apple
B   7   apple
B   6   apple
B   10  apple
A   12  berry
A   11  berry
A   11  berry
A   12  berry
A   10  berry
B   9   berry
B   9   berry
B   10  berry
B   11  berry

I need to filter the df based on the below conditions for each Group. If it is an apple then the Grades should be within the below limit or else exclude those which violate for this Group. It has to check for each unique Group variable and check the price is in range of the corresponding grades.

apple
A   9-10
B   7-8

berry
A   11-12
B   9-10

The output df should then be having those only which meet the criteria.

Grade   Price   Group
A   10  apple
A   9   apple
A   10  apple
B   8   apple
B   7   apple
A   12  berry
A   11  berry
A   11  berry
A   12  berry
B   9   berry
B   9   berry
B   10  berry

I currently filter the dataframe that meet each condition and concatenate the resulting dataframes.

a = df[(df['Group'] == 'apple') & (df['Grade'] == 'A') & (df['Price'].between(9, 10))]

b = df[(df['Group'] == 'apple') & (df['Grade'] == 'B') & (df['Price'].between(7, 8))]

res = pd.concat([a,b])

and this is not optimal to write to multiple data frames for each condition.

Any efficient solution that excludes those that do not meet the criteria would be helpful.

CodePudding user response：

You can create a dictionary for your condition and use apply:

dct = {'apple': {'A': range(9, 11), 'B': range(7, 9)}, \
       'berry': {'A': range(11, 13), 'B': range(9, 11)}}
df[df.apply(lambda x: x.Price in dct[x.Group][x.Grade] , axis=1)]

CodePudding user response：

apples = [apple for apple in myList if apple.Price == 'apple']

Something along this would work, not tested, but list comprehension works well for this problem.

CodePudding user response：

Assuming that the dataframe that OP shares in the question is df1 and one wants to create a new dataframe, df2, the following filter would do the work

df2 = df1[(df1['Group'] == 'apple') & ((df1['Grade'] == 'A') & (df1['Price'].between(9, 10)) | (df1['Grade'] == 'B') & (df1['Price'].between(7, 8))) | (df1['Group'] == 'berry') & ((df1['Grade'] == 'A') & (df1['Price'].between(11, 12)) | (df1['Grade'] == 'B') & (df1['Price'].between(9, 10)))]

[Out]:
   Grade  Price  Group
1      A     10  apple
2      A      9  apple
3      A     10  apple
4      B      8  apple
5      B      7  apple
8      A     12  berry
9      A     11  berry
10     A     11  berry
11     A     12  berry
13     B      9  berry
14     B      9  berry
15     B     10  berry

As per OP's request, one can use the following function

def filter_df(df, groups, prices):

    # Create a list of conditions
    conditions = []

    # For each group
    for group in groups:

        # For each grade
        for grade in prices[group].keys():

            # Create a condition
            condition = (df['Group'] == group) & (df['Grade'] == grade) & (df['Price'].between(prices[group][grade][0], prices[group][grade][1]))

            # Append the condition to the list of conditions
            conditions.append(condition)

    # Filter the dataframe
    df = df[np.any(conditions, axis=0)]

    # Return the filtered dataframe
    return df

And apply it for OP's use case as follows (basically one will have to pass a list with all the groups, and a dictionary with the group, grades, and price range)

df3 = filter_df(df1, ['apple', 'berry'], {'apple':{'A':[9, 10], 'B':[7, 8]}, 'berry':{'A':[11, 12], 'B':[9, 10]}})

[Out]:
   Grade  Price  Group
1      A     10  apple
2      A      9  apple
3      A     10  apple
4      B      8  apple
5      B      7  apple
8      A     12  berry
9      A     11  berry
10     A     11  berry
11     A     12  berry
13     B      9  berry
14     B      9  berry
15     B     10  berry