How can I change the name of columns from a dataframe
match_id player venue goals
1 John home 4
1 Jim home 0
1 Will away 0
1 Dan away 1
...
based on a specific cell value (from column 'venue'), and end up with:
match_id home_player_1 home_player_1_goals home_player_2 home_player_2_goals away_player_1 away_player_1_goals away_player_2 away_player_2_goals
1 John 4 Jim 0 Will 0 Dan 1
CodePudding user response:
I was able to get there with creating two helper columns, which would be your new header structure, and concatenating back two pivoted versions of your dataframe:
cc = df.groupby('venue').cumcount().add(1).astype(str)
df = df.assign(
c = df['venue'] "_player_" cc,
c_goals = df['venue'] "_player_" cc "_goals")
players = df.pivot_table(index='match_id',columns = 'c',aggfunc='first',values='player')
goals = df.pivot_table(index='match_id',columns = 'c_goals',aggfunc='first',values='goals')
out = pd.concat(
[players,goals]
,axis=1).sort_index(axis=1)
>>> print(out)
away_player_1 away_player_1_goals ... home_player_2 home_player_2_goals
match_id ...
1 Will 0 ... Jim 0
[1 rows x 8 columns]
CodePudding user response:
A simple loop can also do the trick. Less concise, but still readable
In [32]: df
Out[32]:
match_id player venue goals
0 1 john h 4
1 1 jim h 0
2 1 will a 0
3 1 da a 1
In [33]: for match in df.match_id.unique():
...: df_m = df.query('match_id==%d' % match)
...: counts = {v:0 for v in df_m.venue.unique()}
...: records = []
...: for i, row in df_m.iterrows():
...: counts[row.venue] = 1
...: count = counts[row.venue]
...: name_record = "%s_player_%d" % (row.venue, count), row.player
...: goal_record = "%s_player_%d_goals" % (row.venue, count), row.goals
...: records = [name_record, goal_record]
...: df_r = pandas.DataFrame(dict(records), index=[0])
...: df_r['match_id'] = match
...:
In [34]: df_r
Out[34]:
h_player_1 h_player_1_goals h_player_2 ... a_player_2 a_player_2_goals match_id
0 john 4 jim ... da 1 1
[1 rows x 9 columns]