Home > Blockchain >  How to find ratio of values in two rows that have the same identifier using python dataframes
How to find ratio of values in two rows that have the same identifier using python dataframes

Time:02-21

I have a dataframe with 4858 rows and 67 columns. This contains the stats from each game in the season for each MLB team. This means that for every game, there are two rows of data. One with the stats from one team and the other with the stats from the team they played. Here are the column names: ['AB', 'R', 'H', 'RBI', 'BB', 'SO', 'PA', 'BA', 'OBP', 'SLG', 'OPS', 'Pit', 'Str', 'RE24', 'WinOrLoss', 'Team', 'Opponent', 'HomeOrAway', 'url', 'Win_Percentage', 'R_Season_Long_Count', 'H_Season_Long_Count', 'BB_Season_Long_Count', 'SO_Season_Long_Count', 'PA_Season_Long_Count', 'R_Moving_Average_3', 'R_Moving_Average_10', 'R_Moving_Average_31', 'SLG_Moving_Average_3', 'SLG_Moving_Average_10', 'SLG_Moving_Average_31', 'BA_Moving_Average_3', 'BA_Moving_Average_10', 'BA_Moving_Average_31', 'OBP_Moving_Average_3', 'OBP_Moving_Average_10', 'OBP_Moving_Average_31', 'SO_Moving_Average_3', 'SO_Moving_Average_10', 'SO_Moving_Average_31', 'AB_Moving_Average_3', 'AB_Moving_Average_10', 'AB_Moving_Average_31', 'Pit_Moving_Average_3', 'Pit_Moving_Average_10', 'Pit_Moving_Average_31', 'H_Moving_Average_3', 'H_Moving_Average_10', 'H_Moving_Average_31', 'BB_Moving_Average_3', 'BB_Moving_Average_10', 'BB_Moving_Average_31', 'OPS_Moving_Average_3', 'OPS_Moving_Average_10', 'OPS_Moving_Average_31', 'RE24_Moving_Average_3', 'RE24_Moving_Average_10', 'RE24_Moving_Average_31', 'Win_Percentage_Moving_Average_3', 'Win_Percentage_Moving_Average_10', 'Win_Percentage_Moving_Average_31', 'BA_Season_Long_Average', 'SLG_Season_Long_Average', 'OPS_Season_Long_Average']

Then, here is a picture of the output from these columns. Sorry, it's only from a few columns but essentially all the stats will just be numbers like this.

enter image description here

The most important column for this question is the url column. This column identifies the game played as there is only one unique url for each game. However, there will be two rows within the dataframe that have this unique url as one will contain the stats from one team in that game and the other will contain the stats from the other team also in that game.

Now, what I am wanting to do is to combine these two rows that are identified by the common url by creating a ratio between them. So, I would like to divide the stats from the first team by the stats from the second team for that specific game with the unique url. I want to do this for each game/unique url. I am able to sum them by using the groupby.sum() function, but I am unsure how to find the ratio between the two rows with the same url. I would really appreciate any suggestions. Thanks so much!

CodePudding user response:

Assumptions:

  • always 2 rows for each url
  • in each url, among the 2 rows, you don't care which is divided by which

A small example of your dataset:

df = pd.DataFrame({
    'url': ['1', '1', '2', '2', '3', '3'],
    'non-stat1': np.arange(1., 7.),
    'non-stat2': np.arange(2., 8.),
    'stat1': np.arange(13., 19.),
    'stat2': np.arange(6., 12.),
})

This lists the columns which are the stats that you want to apply division.

columns_for_ratio = ['stat1', 'stat2']

This is how the division works. .values get you an array which always has two rows, so you can unpack the array into two variables, one array for each.

def divide(two_rows):
    x, y = two_rows.values
    return pd.Series(x/y, two_rows.columns)

And finally do the division

df.groupby('url')[columns_for_ratio].apply(divide)
  • Related