I have a DataFrame like below but much larger:
df = pd.DataFrame({'team': ['Mavs', 'Lakers', 'Spurs', 'Cavs', 'Mavs', 'Lakers', 'Spurs', 'Cavs'],
'name': ['Dirk', 'Kobe', 'Tim', 'Lebron', 'Kobe', 'Lebron', 'Tim', 'Lebron'],
'rebounds': [11, 7, 14, 7, 9, 5,7,12],
'points': [26, 31, 22, 29, 23, 56, 84, 23]})
I need to extract the different team names in the teams column into four separate columns retaining the other data in the row, so that the new DataFrame will have seven columns (Mavs, Lakers, Spurs, Cavs). I am sure this is really easy to do but despite looking into it I am at a loss as most of what I have found online involves splitting a string by a delimiter rather than a value.
A DataFrame that has seven columns (Mavs, Lakers, Spurs, Cavs, name, rebounds, points). The team columns that have been split out can just have the name of the team in so something like:
Mavs | Lakers | Spurs | Cavs | name | rebounds | points |
---|---|---|---|---|---|---|
Mavs | Dirk | 11 | 26 | |||
Mavs | Kobe | 9 | 23 | |||
Lakers | Kobe | 7 | 31 | |||
Lakers | Lebron | 5 | 56 |
and so on
Many thanks in advance. I would post an image but stack overflow doesn't seem to be letting me.
CodePudding user response:
You can try pivot
df_ = df.pivot(index=['name', 'rebounds', 'points'], columns='team', values='team').reset_index().fillna('')
print(df_)
team name rebounds points Cavs Lakers Mavs Spurs
0 Dirk 11 26 Mavs
1 Kobe 7 31 Lakers
2 Kobe 9 23 Mavs
3 Lebron 5 56 Lakers
4 Lebron 7 29 Cavs
5 Lebron 12 23 Cavs
6 Tim 7 84 Spurs
7 Tim 14 22 Spurs
CodePudding user response:
You can use get_dummies
:
df2 = pd.get_dummies(df, columns=['team'], prefix='team')
It will fill in the columns with 1s and 0s.
If you really want the output that you say (with the name of the team, or empty), you can then iterate over those columns and replace the 1s and 0s with what you want:
for team in df['team'].unique():
df2['team_' team].replace({0: '', 1: team}, inplace=True)