I have a DataFrame with 31 columns, which contains 3 categories "Classic", "Premium" and "Luxe" I want to swap the way the DataFrame works to have only 3 comumns "Classic", "Premium" and "Luxe" and 31 categories which can be listed inside.
Since I'm new I can only post a link to the picture for more clarity : Here is what I have and what I want to do
Here's what I tried so far:
Suppose we use this DataFrame
import pandas as pd
import numpy as np
d = {'Name': ["Alban","Benan","Conor","Dino","Egor","Fatima" ],'Virtual visits': ["Classique", "Classique","","" ,"Premium","Premium"], 'Cars' :["","" ,"Luxe","Luxe","Luxe", "Premium"],'Portraits' :["","" ,"Classique","Classique","Luxe", "Premium"],'Animals' :["Premium","Luxe" ,"","","Luxe", ""]}
df = pd.DataFrame(data=d)
df
Name Virtual visits Cars Portraits Animals
0 Alban Classique Premium
1 Benan Classique Luxe
2 Conor Luxe Classique
3 Dino Luxe Classique
4 Egor Premium Luxe Luxe Luxe
5 Fatima Premium Premium Premium
Using melt looked promising so I did that and a groupBy:
df = df.melt(id_vars=["Name"])
df.groupby(by=["Name"]).sum()
Name variable value
Alban Virtual visitsCarsPortraitsAnimals ClassiquePremium
Benan Virtual visitsCarsPortraitsAnimals ClassiqueLuxe
Conor Virtual visitsCarsPortraitsAnimals LuxeClassique
Dino Virtual visitsCarsPortraitsAnimals LuxeClassique
Egor Virtual visitsCarsPortraitsAnimals PremiumLuxeLuxeLuxe
Fatima Virtual visitsCarsPortraitsAnimals PremiumPremiumPremium
So the column "Variable" is made up of all the column names, which is not bad but instead of a single string added to one another, I would like to have a list, and instead of columns Variable and Value, I would need to have 3 columns, "Premium", "Classque", "Luxe" and in each of them have the relevant list of categories appear.
Thanks for your help.
CodePudding user response:
You were right with melt. After that you need a pivot table:
out = df.replace('',np.nan).melt("Name")\
.pivot_table(index="Name",columns="value",values="variable",aggfunc=','.join)\
.rename_axis(None,axis=1).reset_index()
print(out)
Name Classique Luxe \
0 Alban Virtual visits NaN
1 Benan Virtual visits Animals
2 Conor Portraits Cars
3 Dino Portraits Cars
4 Egor NaN Cars,Portraits,Animals
5 Fatima NaN NaN
Premium
0 Animals
1 NaN
2 NaN
3 NaN
4 Virtual visits
5 Virtual visits,Cars,Portraits
CodePudding user response:
df = df.melt(id_vars=["Name"]).groupby(['Name', 'value']).apply(lambda x: list(x['variable'])).reset_index().pivot(index = 'Name', columns = 'value')
df.columns = df.columns.get_level_values(1)
df = df.drop(columns =['']) # in case you have empty value represented by ''
value Classique Luxe Premium
Name
Alban [Virtual visits] NaN [Animals]
Benan [Virtual visits] [Animals] NaN
Conor [Portraits] [Cars] NaN
Dino [Portraits] [Cars] NaN
Egor NaN [Cars, Portraits, Animals] [Virtual visits]
Fatima NaN NaN [Virtual visits, Cars, Portraits]