Pandas Long to Wide for Categorical Dataframe-CodePudding

Usually when we want to transform a dataframe long to wide in Pandas, we use pivot or pivot_table, or unstack, or groupby, but that works well when there are aggregatable elements. How do we unmelt a categorical dataframe?

Example:

d = {'Fruit':['Apple', 'Apple', 'Apple', 'Kiwi'], 
'Color1':['Red', 'Yellow', 'Red', 'Green'],
'Color2':['Red', 'Red', 'Green', 'Brown'],'Color3':[np.nan,np.nan,'Red',np.nan]}

pd.DataFrame(d)

    Fruit   Color1  Color2  Color3
0   Apple   Red     Red     NaN
1   Apple   Yellow  Red     NaN
2   Apple   Red     Green   Red
3   Kiwi    Green   Brown   NaN

Should become something like this:

d = {'Fruit':['Apple','Kiwi'], 
     'Color1':['Red','Green'],
     'Color1_1':['Yellow',np.nan],
     'Color1_2':['Red',np.nan],
     'Color2':['Red', 'Brown'],
     'Color2_1':['Red',np.nan],
     'Color2_2':['Green',np.nan],
     'Color3':[np.nan,np.nan],
     'Color3_1':[np.nan,np.nan],
     'Color3_2':['Red',np.nan]
    }

pd.DataFrame(d)

    Fruit   Color1  Color1_1    Color1_2    Color2  Color2_1    Color2_2    Color3  Color3_1    Color3_2
0   Apple   Red     Yellow      Red         Red     Red         Green       NaN     NaN         Red
1   Kiwi    Green   NaN         NaN         Brown   NaN         NaN         NaN     NaN         NaN

CodePudding user response：

Try cumcount with groupby to get the counts, then pivot on it as the columns, then set the column names, with:

df = df.assign(idx=df.groupby('Fruit').cumcount()).pivot(index='Fruit',columns='idx')
print(df.set_axis([f'{x}_{y}' if y != 0 else x for x, y in df.columns], axis=1).reset_index())

Output:

   Fruit Color1 Color1_1 Color1_2 Color2 Color2_1 Color2_2 Color3 Color3_1 Color3_2
0  Apple    Red   Yellow      Red    Red      Red    Green    NaN      NaN      Red
1   Kiwi  Green      NaN      NaN  Brown      NaN      NaN    NaN      NaN      NaN

Matches your output exactly.