I've got a pandas
dataframe that looks like this:
mydict ={
'person': ['Jenny', 'Jenny', 'David', 'David', 'Max', 'Max'],
'fruit': ['Apple', 'Orange', 'Apple', 'Orange', 'Apple', 'Orange'],
'eaten': [25, 75, 15, 5, 10, 10]
}
df = pd.DataFrame(mydict)
person fruit eaten
Jenny Apple 25
Jenny Orange 75
David Apple 15
David Orange 5
Max Apple 10
Max Orange 10
Which I'd like to convert into:
person apple_percentage orange_percentage
Jenny 0.25 0.75
David 0.75 0.25
Max 0.50 0.50
I'm guessing that I'll have to use groupby
in some capacity to do this, but can't figure out a clean Pythonic way of doing so?
CodePudding user response:
Use DataFrame.pivot
with division by sum
s:
df = df.pivot('person','fruit','eaten').add_suffix('_percentage')
df = df.div(df.sum(axis=1), axis=0)
print (df)
fruit Apple_percentage Orange_percentage
person
David 0.75 0.25
Jenny 0.25 0.75
Max 0.50 0.50
CodePudding user response:
Another option is pandas' crosstab:
pd.crosstab(index = df.person,
columns = df.fruit,
values = df.eaten,
aggfunc = 'mean',
normalize='index')
fruit Apple Orange
person
David 0.75 0.25
Jenny 0.25 0.75
Max 0.50 0.50