I'm currently writing Python code that compares offensive and defensive stats in basketball and I want to be able to create weights with the given stats. I have my stats saved in a dataframe according to: team, position, and other numerical stats. I want to be able to loop through each team and their respective positions and corresponding stats. e.g.:
['DAL', 'C', 0.0, 3.0, 0.5, 0.4, 0.5, 0.7, 6.4]
vs ['BOS', 'C', 1.7, 6.0, 2.1, 0.1, 0.7, 1.9, 9.0]
So I would like to compare BOS vs DAL at the C position and compare points, rebounds, assists etc. If one is greater than the other then divide the greater by the lesser.
The best thing I have so far is to convert the the dataframes to numpy and then proceed to loop through those and append into a blank list:
df1 = df1.to_numpy()
df2 = df2.to_numpy()
df1_array = []
df2_array = []
for x in range(len(df1)):
for a, h in zip(away, home):
if df1[x][0] == a or df1[x][0] == h:
df1_array.append(df1[x])
After I get the new arrays I would then loop through them again to compare values, however I feel like this is too rudimentary. What could be a more efficient or smarter way of executing this?
CodePudding user response:
Using numpy´s function called where is easy to do it:
import pandas as pd
import numpy as np
# Creating the dataframe
team1 = ['DAL', 'C', 0.1, 3.0, 0.5, 0.4, 0.5, 0.7, 6.4]
team2 = ['BOS', 'C', 1.7, 6.0, 2.1, 0.1, 0.7, 1.9, 9.0]
df = pd.DataFrame(
{'team1':team1,
'team2':team2,
})
# Select the rows that contain numbers
df2 = df.iloc[2:].copy()
# Make the comparison, if team1 is larger than team2 then team1/team2 and viseversa.
df2['result'] = np.where(df2['team1']>df2['team2'], \
df2['team1']/df2['team2'], \
df2['team2']/df2['team1'])
df['result'] = df2['result']
print(df)
This yields
team1 team2 result
0 DAL BOS NaN
1 C C NaN
2 0.1 1.7 17.0
3 3.0 6.0 2.0
4 0.5 2.1 4.2
5 0.4 0.1 4.0
6 0.5 0.7 1.4
7 0.7 1.9 2.714286
8 6.4 9.0 1.40625
Becareful with the 0 in the first column of values in your problem description though, I changed it to 0.1 as otherwise it will give zero division error.