I have a dataframe that looks like this
d = {'Name': ['Sally', 'Sally', 'Sally', 'James', 'James', 'James'], 'Sports': ['Tennis', 'Track & field', 'Dance', 'Dance', 'MMA', 'Crosscountry']}
df = pd.DataFrame(data=d)
Name | Sports |
---|---|
Sally | Tennis |
Sally | Track & field |
Sally | Dance |
James | Dance |
James | MMA |
James | Crosscountry |
It seems that pandas' pivot_table only allows reshaping with numerical aggregation, but I want to reshape it to wide format such that the strings are in the "values":
Name | First_sport | Second_sport | Third_sport |
---|---|---|---|
Sally | Tennis | Track & field | Dance |
James | Dance | MMA | Crosscountry |
Is there a method in pandas that can help me do this? Thanks!
CodePudding user response:
You can do that, either with .pivot()
if your column / index names are unique, or with .pivot_table()
by providing an aggregation function that works on strings too, e.g. 'first'
.
>>> df['Sport_num'] = 'Sport ' df.groupby('Name').cumcount().astype(str)
>>> df
Name Sports Sport_num
0 Sally Tennis Sport 0
1 Sally Track & field Sport 1
2 Sally Dance Sport 2
3 James Dance Sport 0
4 James MMA Sport 1
5 James Crosscountry Sport 2
>>> df.pivot(index='Name', values='Sports', columns='Sport_num')
Sport_num Sport 0 Sport 1 Sport 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
>>> df.pivot_table(index='Name', values='Sports', columns='Sport_num', aggfunc='first')
Sport_num Sport 0 Sport 1 Sport 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
CodePudding user response:
Another solution:
print(
df.groupby("Name")
.agg(list)["Sports"]
.apply(pd.Series)
.rename(columns={0: "First", 1: "Second", 2: "Third"})
.add_suffix("_sport")
.reset_index()
)
Prints:
Name First_sport Second_sport Third_sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance
CodePudding user response:
We can also use groupby cumcount
in conjunction with set_index
unstack
:
new_df = df.set_index(['Name', df.groupby('Name').cumcount()]).unstack()
new_df
:
Sports
0 1 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
We can do some additional cleanup by renaming and collapsing the MultiIndex:
new_df = (
df.set_index(['Name', df.groupby('Name').cumcount()])
.unstack()
.rename(columns={0: "First", 1: "Second", 2: "Third",
'Sports': 'Sport'})
)
new_df.columns = new_df.columns.swaplevel().map('_'.join)
new_df = new_df.reset_index()
new_df
:
Name First_Sport Second_Sport Third_Sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance
If wanting a programmatic conversion from ints to ordinal words we can use something like inflect:
import inflect
new_df = df.set_index([
'Name', df.groupby('Name').cumcount().add(1)
]).unstack()
# Collapse MultiIndex
p = inflect.engine()
new_df.columns = new_df.columns.map(
# Convert to Ordinal Word and Column to singular noun
lambda c: f'{p.number_to_words(p.ordinal(c[1])).capitalize()}_'
f'{p.singular_noun(c[0])}'
)
new_df = new_df.reset_index()
new_df
:
Name First_Sport Second_Sport Third_Sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance