Home > OS >  What's the best way for looping through pandas df and comparing 2 different dataframes then per
What's the best way for looping through pandas df and comparing 2 different dataframes then per

Time:12-13

I'm currently writing Python code that compares offensive and defensive stats in basketball and I want to be able to create weights with the given stats. I have my stats saved in a dataframe according to: team, position, and other numerical stats. I want to be able to loop through each team and their respective positions and corresponding stats. e.g.:

['DAL', 'C', 0.0, 3.0, 0.5, 0.4, 0.5, 0.7, 6.4] vs ['BOS', 'C', 1.7, 6.0, 2.1, 0.1, 0.7, 1.9, 9.0]

So I would like to compare BOS vs DAL at the C position and compare points, rebounds, assists etc. If one is greater than the other then divide the greater by the lesser.

The best thing I have so far is to convert the the dataframes to numpy and then proceed to loop through those and append into a blank list:

df1 = df1.to_numpy()
df2 = df2.to_numpy()

df1_array = []
df2_array = []

for x in range(len(df1)):
    for a, h in zip(away, home):
        if df1[x][0] == a or df1[x][0] == h:
            df1_array.append(df1[x])

After I get the new arrays I would then loop through them again to compare values, however I feel like this is too rudimentary. What could be a more efficient or smarter way of executing this?

CodePudding user response:

Using numpy´s function called where is easy to do it:

import pandas as pd
import numpy as np

# Creating the dataframe
team1 = ['DAL', 'C', 0.1, 3.0, 0.5, 0.4, 0.5, 0.7, 6.4] 
team2 = ['BOS', 'C', 1.7, 6.0, 2.1, 0.1, 0.7, 1.9, 9.0]
df = pd.DataFrame(
    {'team1':team1,
     'team2':team2,
    })

# Select the rows that contain numbers
df2 = df.iloc[2:].copy()

# Make the comparison, if team1 is larger than team2 then team1/team2 and viseversa.
df2['result'] = np.where(df2['team1']>df2['team2'], \
                         df2['team1']/df2['team2'], \
                         df2['team2']/df2['team1'])

df['result'] = df2['result']

print(df)

This yields

  team1 team2    result
0   DAL   BOS       NaN
1     C     C       NaN
2   0.1   1.7      17.0
3   3.0   6.0       2.0
4   0.5   2.1       4.2
5   0.4   0.1       4.0
6   0.5   0.7       1.4
7   0.7   1.9  2.714286
8   6.4   9.0   1.40625

Becareful with the 0 in the first column of values in your problem description though, I changed it to 0.1 as otherwise it will give zero division error.

  • Related