How do I group a pandas column to create a new percentage column-CodePudding

I've got a pandas dataframe that looks like this:

mydict ={
        'person': ['Jenny', 'Jenny', 'David', 'David', 'Max', 'Max'],
        'fruit': ['Apple', 'Orange', 'Apple', 'Orange', 'Apple', 'Orange'],
        'eaten': [25, 75, 15, 5, 10, 10]
    }

df = pd.DataFrame(mydict) 

    person fruit   eaten
    Jenny  Apple   25
    Jenny  Orange  75
    David  Apple   15
    David  Orange  5
    Max    Apple   10
    Max    Orange  10

Which I'd like to convert into:

person  apple_percentage  orange_percentage
Jenny   0.25              0.75
David   0.75              0.25
Max     0.50              0.50

I'm guessing that I'll have to use groupby in some capacity to do this, but can't figure out a clean Pythonic way of doing so?

CodePudding user response：

Use DataFrame.pivot with division by sums:

df = df.pivot('person','fruit','eaten').add_suffix('_percentage')
df = df.div(df.sum(axis=1), axis=0)
print (df)
fruit   Apple_percentage  Orange_percentage
person                                     
David               0.75               0.25
Jenny               0.25               0.75
Max                 0.50               0.50

CodePudding user response：

Another option is pandas' crosstab:

pd.crosstab(index = df.person, 
            columns = df.fruit, 
            values = df.eaten, 
            aggfunc = 'mean', 
            normalize='index')
 
fruit   Apple  Orange
person               
David    0.75    0.25
Jenny    0.25    0.75
Max      0.50    0.50