Pandas: Transferring column values to separate columns-CodePudding

Relatively new to Python here. I have a dataframe with three columns: Teams, Year and Medals_Won like so:

Teams   Year    Medals_Won
A       2009    Gold
A       2010    Silver
A       2011    Silver
A       2012    Bronze
A       2013    Gold
B       2009    Bronze
B       2010    Gold
B       2011    Bronze
B       2012    Silver
B       2013    Silver
C       2009    Silver
C       2010    Bronze
C       2011    Gold
C       2012    Gold
C       2013    Bronze

I want to take the Medals_Won column and split it based on the type of medals without upsetting the size and order of the first two columns like so (Missing values would be Null/NaN):

Teams   Year    Gold_Winner  Silver_Winner  Bronze_Winner
A       2009    Gold         Null           Null
A       2010    Null         Silver         Null
A       2011    Null         Silver         Null
A       2012    Null         Null           Bronze
A       2013    Gold         Null           Null
B       2009    Null         Null           Bronze
B       2010    Gold         Null           Null
B       2011    Null         Null           Bronze
B       2012    Null         Silver         Null
B       2013    Null         Silver         Null
C       2009    Null         Silver         Null
C       2010    Null         Null           Bronze
C       2011    Gold         Null           Null
C       2012    Gold         Null           Null
C       2013    Null         Null           Bronze

Can someone please help me in solving this problem? I tried to use the insert function with .loc (which did not work):

df.insert(2, 'Gold_Won', df.loc[df[Medals_Won] == 'Gold')
df.insert(3, 'Silver_Won', df.loc[df[Medals_Won] == 'Silver')
df.insert(4, 'Silver_Won', df.loc[df[Medals_Won] == 'Bronze')

CodePudding user response：

here is one way to do it

# get the dummy columns in a separate DF
dmy=pd.get_dummies(df['Medals_Won'])

# replace values with the column names

dmy=dmy.mul(dmy.columns).replace('','Null')

# combine the two DF
pd.concat([df, dmy],
          axis=1
         )

Teams   Year    Medals_Won  Bronze  Gold    Silver
0   A   2009    Gold    Null    Gold    Null
1   A   2010    Silver  Null    Null    Silver
2   A   2011    Silver  Null    Null    Silver
3   A   2012    Bronze  Bronze  Null    Null
4   A   2013    Gold    Null    Gold    Null
5   B   2009    Bronze  Bronze  Null    Null
6   B   2010    Gold    Null    Gold    Null
7   B   2011    Bronze  Bronze  Null    Null
8   B   2012    Silver  Null    Null    Silver
9   B   2013    Silver  Null    Null    Silver
10  C   2009    Silver  Null    Null    Silver
11  C   2010    Bronze  Bronze  Null    Null
12  C   2011    Gold    Null    Gold    Null
13  C   2012    Gold    Null    Gold    Null
14  C   2013    Bronze  Bronze  Null    Null

CodePudding user response：

You can do this two ways

df['Gold_Winner'] = None
df.loc[df.Medals_Won == 'Gold', 'Gold_Winner'] = 'Gold'

you can iterate

 for medal_type in ['Gold', 'Silver', 'Bronze']:
    df[f'{medal_type}_Winner'] = None
    df.loc[df.Medals_Won == medal_type, f'{medal_type}_Winner'] = medal_type

a better way is to use pivot

df.pivot(index=['Teams', 'Year'], columns='Medals_Won', values='Medals_Won')

CodePudding user response：

If you have the dataframe loaded in as df, then I think the following code should get you what you're looking for:

df["Gold_Winner"] = df["Medals_Won"].apply(lambda x: "Gold" if x == "Gold" else None)
df["Silver_Winner"] = df["Medals_Won"].apply(lambda x: "Silver" if x == "Silver" else None)
df["Bronze_Winner"] = df["Medals_Won"].apply(lambda x: "Bronze" if x == "Bronze" else None)
df = df.drop(columns=["Medals_Won"])

It creates one column for each of the gold, silver, and bronze categories and then it drops the Medals_Won category.

CodePudding user response：

As you can see, there are lots of ways to achieve this. This one would hardly be the fastest in this case, but just for reference: you could use str.extract as follows:

tmp = df.Medals_Won.str.extract(r'(?P<Gold_Winner>Gold)|(?P<Silver_Winner>Silver)|(?P<Bronze_Winner>Bronze)')

df[tmp.columns] = tmp
df = df.drop('Medals_Won', axis=1)
print(df)

   Teams  Year Gold_Winner Silver_Winner Bronze_Winner
0      A  2009        Gold           NaN           NaN
1      A  2010         NaN        Silver           NaN
2      A  2011         NaN        Silver           NaN
3      A  2012         NaN           NaN        Bronze
4      A  2013        Gold           NaN           NaN
5      B  2009         NaN           NaN        Bronze
6      B  2010        Gold           NaN           NaN
7      B  2011         NaN           NaN        Bronze
8      B  2012         NaN        Silver           NaN
9      B  2013         NaN        Silver           NaN
10     C  2009         NaN        Silver           NaN
11     C  2010         NaN           NaN        Bronze
12     C  2011        Gold           NaN           NaN
13     C  2012        Gold           NaN           NaN
14     C  2013         NaN           NaN        Bronze