Python Groupby with Conditions-CodePudding

I have a dataframe with three columns. I need to group by Name and sum the Amount column while counting the Fruit column only if it meets a certain criteria. Here's an example of what the dataframe looks like:

 Name Fruit   Amount
 Bob  Apple   5
 Bob  Orange  7
 Bill Apple   4
 Bill Orange  8
 Bill Apple   3

In this example I'd group Bob and Bill's names, sum the Amount for Apple and Orange, but count only the number of occurrences of Apple, so the new dataframe would look like this:

 Name   Sum   Count
 Bob    12      1
 Bill   15      2

I've tried the following but don't get to the correct answer:

df = pd.pivot_table(
    df,
    index=['Name'],
    aggfunc={'Amount':np.sum}
    ).rename(columns={'Fruit':'count'})

CodePudding user response：

Use agg

df.groupby('Name').agg(**{'Sum':('Amount','sum'),'Count':('Fruit', lambda x: (x=='Apple').sum())}).reset_index()

   Name  Sum  Count
0  Bill   15      2
1   Bob   12      1

CodePudding user response：

It should be faster if you avoid the anonymous function (depending on the data size):


lump = [df.groupby('Name').Amount.sum(), 
        df.Fruit.eq('Apple').groupby(df.Name).sum().rename('Count')]

pd.concat(lump, axis = 1)

      Amount  Count
Name               
Bill      15      2
Bob       12      1

You could also create a temporary column, and used a named aggregation; again, the aim is to avoid the anonymous(lambda function):

(df.assign(Count = df.Fruit.eq('Apple'))
   .groupby('Name')
   .agg(Sum = ('Amount', 'sum'), 
        Count=('Count', 'sum'))
) 
      Sum  Count
Name            
Bill   15      2
Bob    12      1