Pivoting without numerical aggregation/ a numerical column-CodePudding

I have a dataframe that looks like this

d = {'Name': ['Sally', 'Sally', 'Sally', 'James', 'James', 'James'], 'Sports': ['Tennis', 'Track & field', 'Dance', 'Dance', 'MMA', 'Crosscountry']}
df = pd.DataFrame(data=d)

Name	Sports
Sally	Tennis
Sally	Track & field
Sally	Dance
James	Dance
James	MMA
James	Crosscountry

It seems that pandas' pivot_table only allows reshaping with numerical aggregation, but I want to reshape it to wide format such that the strings are in the "values":

Name	First_sport	Second_sport	Third_sport
Sally	Tennis	Track & field	Dance
James	Dance	MMA	Crosscountry

Is there a method in pandas that can help me do this? Thanks!

CodePudding user response：

You can do that, either with .pivot() if your column / index names are unique, or with .pivot_table() by providing an aggregation function that works on strings too, e.g. 'first'.

>>> df['Sport_num'] = 'Sport '   df.groupby('Name').cumcount().astype(str)
>>> df
    Name         Sports Sport_num
0  Sally         Tennis   Sport 0
1  Sally  Track & field   Sport 1
2  Sally          Dance   Sport 2
3  James          Dance   Sport 0
4  James            MMA   Sport 1
5  James   Crosscountry   Sport 2
>>> df.pivot(index='Name', values='Sports', columns='Sport_num')
Sport_num Sport 0        Sport 1       Sport 2
Name                                          
James       Dance            MMA  Crosscountry
Sally      Tennis  Track & field         Dance
>>> df.pivot_table(index='Name', values='Sports', columns='Sport_num', aggfunc='first')
Sport_num Sport 0        Sport 1       Sport 2
Name                                          
James       Dance            MMA  Crosscountry
Sally      Tennis  Track & field         Dance

CodePudding user response：

Another solution:

print(
    df.groupby("Name")
    .agg(list)["Sports"]
    .apply(pd.Series)
    .rename(columns={0: "First", 1: "Second", 2: "Third"})
    .add_suffix("_sport")
    .reset_index()
)

Prints:

    Name First_sport   Second_sport   Third_sport
0  James       Dance            MMA  Crosscountry
1  Sally      Tennis  Track & field         Dance

CodePudding user response：

We can also use groupby cumcount in conjunction with set_index unstack:

new_df = df.set_index(['Name', df.groupby('Name').cumcount()]).unstack()

new_df:

       Sports                             
            0              1             2
Name                                      
James   Dance            MMA  Crosscountry
Sally  Tennis  Track & field         Dance

We can do some additional cleanup by renaming and collapsing the MultiIndex:

new_df = (
    df.set_index(['Name', df.groupby('Name').cumcount()])
        .unstack()
        .rename(columns={0: "First", 1: "Second", 2: "Third",
                         'Sports': 'Sport'})
)
new_df.columns = new_df.columns.swaplevel().map('_'.join)
new_df = new_df.reset_index()

new_df:

    Name First_Sport   Second_Sport   Third_Sport
0  James       Dance            MMA  Crosscountry
1  Sally      Tennis  Track & field         Dance

If wanting a programmatic conversion from ints to ordinal words we can use something like inflect:

import inflect

new_df = df.set_index([
    'Name', df.groupby('Name').cumcount().add(1)
]).unstack()
# Collapse MultiIndex
p = inflect.engine()
new_df.columns = new_df.columns.map(
    # Convert to Ordinal Word and Column to singular noun
    lambda c: f'{p.number_to_words(p.ordinal(c[1])).capitalize()}_'
              f'{p.singular_noun(c[0])}'
)
new_df = new_df.reset_index()

new_df:

    Name First_Sport   Second_Sport   Third_Sport
0  James       Dance            MMA  Crosscountry
1  Sally      Tennis  Track & field         Dance