Home > Back-end >  pivot df with duplicates as new rows
pivot df with duplicates as new rows

Time:12-03

Evening, I have a dataframe that I want to reshape. there are duplicate id vars for some columns, and i want the duplicate values to appear as new rows

my data looks like this, and i want to have the ids as a row, with the group as column, and the choices as the values. if there are multiple choices picked per id within a group, then the row should be replicated as shown below. when I use pivot I end up just getting the mean or sum of the combined values e.g. 11.5 for id i1, group1. all tips very welcome thank you

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': ['i1','i1','i1','i2','i2','i2','i2','i2','i3','i3'],
    'group': ['group1','group1','group2','group3','group1','group2','group2','group3','group1','group2'],
    'choice':[12,11,12,14,11,19,9,7,8,9]})
pd.DataFrame({'id': ['i1','i1','i2','i2','i3'],
              'group1': [12,11,11,np.nan,8],
              'group2': [12,np.nan,19,9,9],
              'group3':[np.nan,np.nan,14,7,np.nan]})

CodePudding user response:

Use GroupBy.cumcount with Series.unstack and DataFrame.droplevel:

g = df.groupby(['id','group']).cumcount().add(1)

df = (df.set_index(['id','group', g])['choice']
        .unstack(level=1)
        .droplevel(level=1)
        .rename_axis(None,axis=1)
        .reset_index())
print (df)
   id  group1  group2  group3
0  i1    12.0    12.0     NaN
1  i1    11.0     NaN     NaN
2  i2    11.0    19.0    14.0
3  i2     NaN     9.0     7.0
4  i3     8.0     9.0     NaN
  • Related