Home > Blockchain >  How do I use Pandas to get unique values in a column while summing certain values in another column?
How do I use Pandas to get unique values in a column while summing certain values in another column?

Time:10-26

My question is related to another post here How do I Pandas group-by to get sum? but it does not answer my question.

I have this dataframe:

Fruit    Name  Number
Apples   Bob    7
Apples   Bob    8
Apples   Mike   9
Apples   Steve 10
Apples   Bob    1
Oranges  Bob    2
Oranges  Tom   15
Oranges  Mike  57
Oranges  Bob   65
Oranges  Tony   1
Grapes   Bob    1
Grapes   Tom   87
Grapes   Bob   22
Grapes   Bob   12
Grapes   Tony  15
Melons   Mike  10 

I want to get a dataframe where the first column should have all the unique values from 'Fruit' column above; second column should have the sum of values from 'Number' column but only for one person, say Bob from above. If this person does not have the particular fruit, the second column should have 0. Here is the desired output:

Fruit    NumberForBob
Apples     7 8 1=16
Oranges     2 65=67
Grapes     1 22 12=35
Melons     0

I think I need to use a mix of if-statement and groupby function, but I am not able to get the desired output. How can I do this?

CodePudding user response:

Try using a pivot_table

pivot = df.pivot_table(index='Fruit', columns='Name',
                       values='Number', aggfunc=sum)

Name      Bob  Mike  Steve   Tom  Tony
Fruit                                 
Apples   16.0   9.0   10.0   NaN   NaN
Grapes   35.0   NaN    NaN  87.0  15.0
Melons    NaN  10.0    NaN   NaN   NaN
Oranges  67.0  57.0    NaN  15.0   1.0

then

print(pivot['Bob'])

Fruit
Apples     16.0
Grapes     35.0
Melons      NaN
Oranges    67.0
Name: Bob, dtype: float64

you can also add fill_value=0 if you want

CodePudding user response:

If you needed this for all names, you would need to group by Fruit and Name.

However, of you just need Bob: golden rule, do not compute something unnecessarily if you have to discard it immediately.

Filter your data for Bob, compute the groupby.sum, reindex with the unique fruits:

(df.loc[df['Name'].eq('Bob')]
   .groupby('Fruit')['Number'].sum()
   .reindex(df['Fruit'].unique(), fill_value=0)
 )

Output:

Fruit
Apples     16
Oranges    67
Grapes     35
Melons      0
Name: Number, dtype: int64

Alternative output:

(df.loc[df['Name'].eq('Bob')]
   .groupby('Fruit')['Number'].sum()
   .reindex(df['Fruit'].unique(), fill_value=0)
   .reset_index(name='NumberForBob')
 )

     Fruit  NumberForBob
0   Apples            16
1  Oranges            67
2   Grapes            35
3   Melons             0
  • Related