Home > Blockchain >  Pandas - Combine multiple group rows into one row
Pandas - Combine multiple group rows into one row

Time:08-06

I have been banging my head against a wall for a while now trying to figure out this seemingly easy data manipulation task in Pandas, however I have had no success figuring out how to do it or googling a sufficient answer :(

All I want to do is take the table on the left of the snip below (will be a pandas dataframe) and convert it into the table on the right (to become another pandas dataframe).

enter image description here

Code for creating the initial dataframe:

import pandas as pd

test_data = pd.DataFrame(
        {
            'team': [1,1,2,2,3,3,4,4,5,5] ,
            'player': ['a','b','c','d','e','f','g','h','i','j'] ,
            'score': [10,22,66,44,1,3,55,6,4,2]
        }
)

Thank you for your help in advance!

CodePudding user response:

try this,

test_data.groupby('team').agg({'player':['first', 'last'], 'score': ['first', 'last']})

O/P:

    player_first player_last  score_first  score_last
team                                                  
1               a           b           10          22
2               c           d           66          44
3               e           f            1           3
4               g           h           55           6
5               i           j            4           2

Complete solution:

test_data = test_data.groupby('team').agg({'player':['first', 'last'], 'score': ['first', 'last']})
test_data.columns = ['_'.join(x) for x in test_data.columns]
test_data = test_data.reset_index()
test_data = test_data[['team', 'player_first', 'score_first', 'player_last', 'score_last']]

O/P:

   team player_first  score_first player_last  score_last
0     1            a           10           b          22
1     2            c           66           d          44
2     3            e            1           f           3
3     4            g           55           h           6
4     5            i            4           j           2​
  • What you need is groupby and aggregation ops of first and last
  • set column names
  • reset index and re order columns

CodePudding user response:

Assuming your dataframe is sorted by team and you know there's exactly 2 entries for each team, you can use:

player_1_data = test_data.iloc[::2, :]
player_2_data = test_data.iloc[1::2, :]

player_1_data.set_index("team")
.add_suffix("1")
.join(player_2_data.set_index("team").add_suffix("2"))
.reset_index()
.rename({"score1": "player1_score", "score2": "player2_score"}, axis=1)

Here, we just pick up alternating rows and join them together to give:

   team player1  player1_score player2  player2_score
0     1       a             10       b             22
1     2       c             66       d             44
2     3       e              1       f              3
3     4       g             55       h              6
4     5       i              4       j              2

CodePudding user response:

You can do in this way:

import pandas as pd

df = pd.DataFrame(
        {
            'team': [1,1,2,2,3,3,4,4,5,5] ,
            'player': ['a','b','c','d','e','f','g','h','i','j'] ,
            'score': [10,22,66,44,1,3,55,6,4,2]
        }
)
grouped_df = df.groupby('team').agg({ 'player': ['first', 'last'], 'score': ['first', 'last']})
grouped_df.columns = [ 'player_first', 'score_first', 'player_last', 'score_last']
print(grouped_df) 
  • Related