Evening, I have a dataframe that I want to reshape. there are duplicate id vars for some columns, and i want the duplicate values to appear as new rows
my data looks like this, and i want to have the ids as a row, with the group as column, and the choices as the values. if there are multiple choices picked per id within a group, then the row should be replicated as shown below. when I use pivot I end up just getting the mean or sum of the combined values e.g. 11.5 for id i1, group1. all tips very welcome thank you
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': ['i1','i1','i1','i2','i2','i2','i2','i2','i3','i3'],
'group': ['group1','group1','group2','group3','group1','group2','group2','group3','group1','group2'],
'choice':[12,11,12,14,11,19,9,7,8,9]})
pd.DataFrame({'id': ['i1','i1','i2','i2','i3'],
'group1': [12,11,11,np.nan,8],
'group2': [12,np.nan,19,9,9],
'group3':[np.nan,np.nan,14,7,np.nan]})
CodePudding user response:
Use GroupBy.cumcount
with Series.unstack
and DataFrame.droplevel
:
g = df.groupby(['id','group']).cumcount().add(1)
df = (df.set_index(['id','group', g])['choice']
.unstack(level=1)
.droplevel(level=1)
.rename_axis(None,axis=1)
.reset_index())
print (df)
id group1 group2 group3
0 i1 12.0 12.0 NaN
1 i1 11.0 NaN NaN
2 i2 11.0 19.0 14.0
3 i2 NaN 9.0 7.0
4 i3 8.0 9.0 NaN