I have a large df of the following structure:
Grade Price Group
A 2 apple
A 10 apple
A 9 apple
A 10 apple
B 8 apple
B 7 apple
B 6 apple
B 10 apple
A 12 berry
A 11 berry
A 11 berry
A 12 berry
A 10 berry
B 9 berry
B 9 berry
B 10 berry
B 11 berry
I need to filter the df based on the below conditions for each Group
. If it is an apple then the Grades should be within the below limit or else exclude those which violate for this Group
. It has to check for each unique Group
variable and check the price is in range of the corresponding grades.
apple
A 9-10
B 7-8
berry
A 11-12
B 9-10
The output df should then be having those only which meet the criteria.
Grade Price Group
A 10 apple
A 9 apple
A 10 apple
B 8 apple
B 7 apple
A 12 berry
A 11 berry
A 11 berry
A 12 berry
B 9 berry
B 9 berry
B 10 berry
I currently filter the dataframe that meet each condition and concatenate the resulting dataframes.
a = df[(df['Group'] == 'apple') & (df['Grade'] == 'A') & (df['Price'].between(9, 10))]
b = df[(df['Group'] == 'apple') & (df['Grade'] == 'B') & (df['Price'].between(7, 8))]
res = pd.concat([a,b])
and this is not optimal to write to multiple data frames for each condition.
Any efficient solution that excludes those that do not meet the criteria would be helpful.
CodePudding user response:
You can create a dictionary for your condition and use apply
:
dct = {'apple': {'A': range(9, 11), 'B': range(7, 9)}, \
'berry': {'A': range(11, 13), 'B': range(9, 11)}}
df[df.apply(lambda x: x.Price in dct[x.Group][x.Grade] , axis=1)]
CodePudding user response:
apples = [apple for apple in myList if apple.Price == 'apple']
Something along this would work, not tested, but list comprehension works well for this problem.
CodePudding user response:
Assuming that the dataframe that OP shares in the question is df1
and one wants to create a new dataframe, df2
, the following filter would do the work
df2 = df1[(df1['Group'] == 'apple') & ((df1['Grade'] == 'A') & (df1['Price'].between(9, 10)) | (df1['Grade'] == 'B') & (df1['Price'].between(7, 8))) | (df1['Group'] == 'berry') & ((df1['Grade'] == 'A') & (df1['Price'].between(11, 12)) | (df1['Grade'] == 'B') & (df1['Price'].between(9, 10)))]
[Out]:
Grade Price Group
1 A 10 apple
2 A 9 apple
3 A 10 apple
4 B 8 apple
5 B 7 apple
8 A 12 berry
9 A 11 berry
10 A 11 berry
11 A 12 berry
13 B 9 berry
14 B 9 berry
15 B 10 berry
As per OP's request, one can use the following function
def filter_df(df, groups, prices):
# Create a list of conditions
conditions = []
# For each group
for group in groups:
# For each grade
for grade in prices[group].keys():
# Create a condition
condition = (df['Group'] == group) & (df['Grade'] == grade) & (df['Price'].between(prices[group][grade][0], prices[group][grade][1]))
# Append the condition to the list of conditions
conditions.append(condition)
# Filter the dataframe
df = df[np.any(conditions, axis=0)]
# Return the filtered dataframe
return df
And apply it for OP's use case as follows (basically one will have to pass a list with all the groups, and a dictionary with the group, grades, and price range)
df3 = filter_df(df1, ['apple', 'berry'], {'apple':{'A':[9, 10], 'B':[7, 8]}, 'berry':{'A':[11, 12], 'B':[9, 10]}})
[Out]:
Grade Price Group
1 A 10 apple
2 A 9 apple
3 A 10 apple
4 B 8 apple
5 B 7 apple
8 A 12 berry
9 A 11 berry
10 A 11 berry
11 A 12 berry
13 B 9 berry
14 B 9 berry
15 B 10 berry