I am exploring a rugby dataset where I am stuck on a bit of an issue. The dataset contains matches played in the the Six Nations over a year, and the tries are put into two separate columns, which is "Home try's" and "Away try's". I am trying to print out the output of the teams with the largest amount of try`s. Below is an example of how the dataset looks.
Home | Away | Home try's | Away try's |
---|---|---|---|
England | France | 3 | 2 |
Ireland | Scotland | 1 | 4 |
France | Ireland | 1 | 0 |
Scotland | England | 1 | 3 |
I have tried to put the home and away columns to a single column as a list and use the arguments df.explode and df.groupby argument to create a dataframe, but it doesnt work. Any tips on how I could solve this?
The ideal output would be:
England | Scotland | France | Ireland |
---|---|---|---|
6 | 5 | 3 | 1 |
CodePudding user response:
How about this?
series_home = df.groupby('Home').sum()["Home try's"]
series_away = df.groupby('Away').sum()["Away try's"]
combined_series = series_home series_away
combined_df = pd.DataFrame(combined_series).transpose()
CodePudding user response:
I was able to do this in a way combining the two DFs and using a pivot.
df_Home = df[['Home', 'Home_Try']]
df_Away = df[['Away', 'Away_Try']]
df_Away.columns = [['Team', 'Tries']]
df_Home.columns = [['Team', 'Tries']]
df2 = pd.concat([df_Home, df_Away])
df2.columns = ['Team', 'Tries']
df_pivot = pd.pivot_table(df2, values='Tries', columns=['Team'], aggfunc=np.sum)
df_pivot