I have two pandas dataframes and want to apply a function "Stat / Percentage Played". The first data frame - "Stat" looks like this:
- ----- ------- ------- ------- -------
0 Name Round 1 Round 2 Round 3 Round 4
1 Tom 10 22 18 21
2 Angus 13 16 21 19
3 Jack 18 19 16 17
4 Harry 21 19 15 19
5 Ben 19 24 22 27
- ----- ------- ------- ------- -------
The second dataframe shows the amount of time (percentage) a player has spent on the field, and looks like this:
- ----- ------- ------- ------- -------
0 Name Round 1 Round 2 Round 3 Round 4
1 Tom 65 80 81 80
2 Angus 72 76 85 79
3 Jack 79 81 68 75
4 Harry 79 78 62 77
5 Ben 81 84 76 85
- ----- ------- ------- ------- -------
With these I want to applymap to each cell from both dataframes the get the corresponding player stat from that round to be divided by the playing time pertage for that player in the round. The function being applied for each cell would look like so:
- ----- ------- ------- ------- -------
0 Name Round 1 Round 2 Round 3 Round 4
1 Tom 10/65 22/80 18/81 21/80
2 Angus 13/72 16/76 21/85 19/79
3 Jack 18/79 19/81 16/68 17/75
4 Harry 21/79 19/78 15/62 19/77
5 Ben 19/81 24/84 22/76 27/85
- ----- ------- ------- ------- -------
Furthermore the final values in my final dataframe should look like this:
- ----- ------------------- ------------------- ------------------- -------------------
0 Name Round 1 Round 2 Round 3 Round 4
1 Tom 0.15384615384615385 0.275 0.2222222222222222 0.2625
2 Angus 0.18055555555555555 0.21052631578947367 0.24705882352941178 0.24050632911392406
3 Jack 0.22784810126582278 0.2345679012345679 0.23529411764705882 0.22666666666666666
4 Harry 0.26582278481012656 0.24358974358974358 0.24193548387096775 0.24675324675324675
5 Ben 0.2345679012345679 0.2857142857142857 0.2894736842105263 0.3176470588235294
- ----- ------------------- ------------------- ------------------- -------------------
The only way I can figure out a method is by converting to lists and using loops to complete this function, for example:
datlist = df.values.tolist()
headerList = datlist[:1] # Sava Header row to merge after calulcations
statList = datlist[1:] # Remove Header row for calculations
perlist = dfPerc.values.tolist()[1:] # Remove Header row for calculations
AdjList = []
for z in range(len(statList)):
PlayerList = []
for i in range(len(statList[z])):
if isinstance(statList[z][i], int) == True:
adjStat = statList[z][i] / perlist[z][i]
else:
adjStat = statList[z][i]
PlayerList.append(adjStat)
AdjList.append(PlayerList)
AdjList.insert(0, headerList[0])
finaldf = pd.DataFrame(AdjList)
print(finaldf)
Ouput:
- ----- ------------------- ------------------- ------------------- -------------------
0 Name Round 1 Round 2 Round 3 Round 4
1 Tom 0.15384615384615385 0.275 0.2222222222222222 0.2625
2 Angus 0.18055555555555555 0.21052631578947367 0.24705882352941178 0.24050632911392406
3 Jack 0.22784810126582278 0.2345679012345679 0.23529411764705882 0.22666666666666666
4 Harry 0.26582278481012656 0.24358974358974358 0.24193548387096775 0.24675324675324675
5 Ben 0.2345679012345679 0.2857142857142857 0.2894736842105263 0.3176470588235294
- ----- ------------------- ------------------- ------------------- -------------------
However this method seems very inefficient and creates temporary lists which I feel is unnecessary. Is there a better method where I can complete this calculation without using lists and loops - for example the applymap function?
CodePudding user response:
Use pandas vectorization:
>>> columns = ['Round 1', 'Round 2', 'Round 3', 'Round 4']
>>> final = df1[columns] / df2[columns]
You can also use filter
>>> df1.filter(like='Round')/df2.filter(like='Round')
Can get the names back by assigning:
>>> final['Name'] = df1['Name']