I have a dataframe with three columns. I need to group by Name and sum the Amount column while counting the Fruit column only if it meets a certain criteria. Here's an example of what the dataframe looks like:
Name Fruit Amount
Bob Apple 5
Bob Orange 7
Bill Apple 4
Bill Orange 8
Bill Apple 3
In this example I'd group Bob and Bill's names, sum the Amount for Apple and Orange, but count only the number of occurrences of Apple, so the new dataframe would look like this:
Name Sum Count
Bob 12 1
Bill 15 2
I've tried the following but don't get to the correct answer:
df = pd.pivot_table(
df,
index=['Name'],
aggfunc={'Amount':np.sum}
).rename(columns={'Fruit':'count'})
CodePudding user response:
Use agg
df.groupby('Name').agg(**{'Sum':('Amount','sum'),'Count':('Fruit', lambda x: (x=='Apple').sum())}).reset_index()
Name Sum Count
0 Bill 15 2
1 Bob 12 1
CodePudding user response:
It should be faster if you avoid the anonymous function (depending on the data size):
lump = [df.groupby('Name').Amount.sum(),
df.Fruit.eq('Apple').groupby(df.Name).sum().rename('Count')]
pd.concat(lump, axis = 1)
Amount Count
Name
Bill 15 2
Bob 12 1
You could also create a temporary column, and used a named aggregation; again, the aim is to avoid the anonymous(lambda function):
(df.assign(Count = df.Fruit.eq('Apple'))
.groupby('Name')
.agg(Sum = ('Amount', 'sum'),
Count=('Count', 'sum'))
)
Sum Count
Name
Bill 15 2
Bob 12 1