Home > Back-end >  Pandas - rename columns based on cell value
Pandas - rename columns based on cell value

Time:09-23

How can I change the name of columns from a dataframe

match_id player venue goals
1        John   home  4
1        Jim    home  0
1        Will   away  0 
1        Dan    away  1
...

based on a specific cell value (from column 'venue'), and end up with:

match_id home_player_1  home_player_1_goals home_player_2 home_player_2_goals away_player_1 away_player_1_goals away_player_2 away_player_2_goals
1        John           4                   Jim            0                    Will        0      Dan     1 
                                                               

CodePudding user response:

I was able to get there with creating two helper columns, which would be your new header structure, and concatenating back two pivoted versions of your dataframe:

cc = df.groupby('venue').cumcount().add(1).astype(str)
 
df = df.assign(
    c = df['venue']   "_player_"   cc,
    c_goals = df['venue']   "_player_"   cc   "_goals")

players = df.pivot_table(index='match_id',columns = 'c',aggfunc='first',values='player')
goals = df.pivot_table(index='match_id',columns = 'c_goals',aggfunc='first',values='goals')    

out = pd.concat(
    [players,goals]
    ,axis=1).sort_index(axis=1)

>>> print(out)

          away_player_1  away_player_1_goals  ... home_player_2  home_player_2_goals
match_id                                     ...                                   
1                 Will                    0  ...           Jim                    0

[1 rows x 8 columns]

CodePudding user response:

A simple loop can also do the trick. Less concise, but still readable

In [32]: df
Out[32]: 
   match_id player venue  goals
0         1   john     h      4
1         1    jim     h      0
2         1   will     a      0
3         1     da     a      1

In [33]: for match in df.match_id.unique():
    ...:   df_m = df.query('match_id==%d' % match)
    ...:   counts = {v:0 for v in df_m.venue.unique()}
    ...:   records = []
    ...:   for i, row in df_m.iterrows():
    ...:     counts[row.venue]  = 1
    ...:     count = counts[row.venue]
    ...:     name_record = "%s_player_%d" % (row.venue, count), row.player
    ...:     goal_record = "%s_player_%d_goals" % (row.venue, count), row.goals
    ...:     records  = [name_record, goal_record]
    ...:   df_r = pandas.DataFrame(dict(records), index=[0])
    ...:   df_r['match_id'] = match
    ...:

In [34]: df_r
Out[34]: 
  h_player_1  h_player_1_goals h_player_2  ...  a_player_2 a_player_2_goals  match_id
0       john                 4        jim  ...          da                1         1

[1 rows x 9 columns]

  • Related