Usually when we want to transform a dataframe long to wide in Pandas, we use pivot or pivot_table, or unstack, or groupby, but that works well when there are aggregatable elements. How do we unmelt a categorical dataframe?
Example:
d = {'Fruit':['Apple', 'Apple', 'Apple', 'Kiwi'],
'Color1':['Red', 'Yellow', 'Red', 'Green'],
'Color2':['Red', 'Red', 'Green', 'Brown'],'Color3':[np.nan,np.nan,'Red',np.nan]}
pd.DataFrame(d)
Fruit Color1 Color2 Color3
0 Apple Red Red NaN
1 Apple Yellow Red NaN
2 Apple Red Green Red
3 Kiwi Green Brown NaN
Should become something like this:
d = {'Fruit':['Apple','Kiwi'],
'Color1':['Red','Green'],
'Color1_1':['Yellow',np.nan],
'Color1_2':['Red',np.nan],
'Color2':['Red', 'Brown'],
'Color2_1':['Red',np.nan],
'Color2_2':['Green',np.nan],
'Color3':[np.nan,np.nan],
'Color3_1':[np.nan,np.nan],
'Color3_2':['Red',np.nan]
}
pd.DataFrame(d)
Fruit Color1 Color1_1 Color1_2 Color2 Color2_1 Color2_2 Color3 Color3_1 Color3_2
0 Apple Red Yellow Red Red Red Green NaN NaN Red
1 Kiwi Green NaN NaN Brown NaN NaN NaN NaN NaN
CodePudding user response:
Try cumcount
with groupby
to get the counts, then pivot
on it as the columns, then set the column names, with:
df = df.assign(idx=df.groupby('Fruit').cumcount()).pivot(index='Fruit',columns='idx')
print(df.set_axis([f'{x}_{y}' if y != 0 else x for x, y in df.columns], axis=1).reset_index())
Output:
Fruit Color1 Color1_1 Color1_2 Color2 Color2_1 Color2_2 Color3 Color3_1 Color3_2
0 Apple Red Yellow Red Red Red Green NaN NaN Red
1 Kiwi Green NaN NaN Brown NaN NaN NaN NaN NaN
Matches your output exactly.