Home > Software design >  Separate values in a DataFrame column into a new columns depending on value
Separate values in a DataFrame column into a new columns depending on value

Time:05-07

I have a DataFrame like below but much larger:

df = pd.DataFrame({'team': ['Mavs', 'Lakers', 'Spurs', 'Cavs', 'Mavs', 'Lakers', 'Spurs', 'Cavs'],
                   'name': ['Dirk', 'Kobe', 'Tim', 'Lebron', 'Kobe', 'Lebron', 'Tim', 'Lebron'],
                   'rebounds': [11, 7, 14, 7, 9, 5,7,12],
                   'points': [26, 31, 22, 29, 23, 56, 84, 23]})

I need to extract the different team names in the teams column into four separate columns retaining the other data in the row, so that the new DataFrame will have seven columns (Mavs, Lakers, Spurs, Cavs). I am sure this is really easy to do but despite looking into it I am at a loss as most of what I have found online involves splitting a string by a delimiter rather than a value.

A DataFrame that has seven columns (Mavs, Lakers, Spurs, Cavs, name, rebounds, points). The team columns that have been split out can just have the name of the team in so something like:

Mavs Lakers Spurs Cavs name rebounds points
Mavs Dirk 11 26
Mavs Kobe 9 23
Lakers Kobe 7 31
Lakers Lebron 5 56

and so on

Many thanks in advance. I would post an image but stack overflow doesn't seem to be letting me.

CodePudding user response:

You can try pivot

df_ = df.pivot(index=['name', 'rebounds', 'points'], columns='team', values='team').reset_index().fillna('')
print(df_)

team    name  rebounds  points  Cavs  Lakers  Mavs  Spurs
0       Dirk        11      26                Mavs
1       Kobe         7      31        Lakers
2       Kobe         9      23                Mavs
3     Lebron         5      56        Lakers
4     Lebron         7      29  Cavs
5     Lebron        12      23  Cavs
6        Tim         7      84                      Spurs
7        Tim        14      22                      Spurs

CodePudding user response:

You can use get_dummies:

df2 = pd.get_dummies(df, columns=['team'], prefix='team')

It will fill in the columns with 1s and 0s.

If you really want the output that you say (with the name of the team, or empty), you can then iterate over those columns and replace the 1s and 0s with what you want:

for team in df['team'].unique():
    df2['team_'   team].replace({0: '', 1: team}, inplace=True)
  • Related