More efficient way to write this lambda function-CodePudding

import pandas as pd

prizes = ([1, 100], [2, 50], [3, 25])
prizes = pd.DataFrame(prizes, columns=['Rank', 'Payout'])

ranking = ([1, 3, 2], [2, 2, 1], [3, 1, 3])
ranking = pd.DataFrame(ranking, columns=[1, 2, 3])

payouts = pd.DataFrame(range(1, 4), columns=['Lineup'])
mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat([payouts, ranking[range(1, 4)].apply(lambda s: s.map(mapper)).fillna(-1)], axis=1)

print(ranking)
print(payouts)

   1  2  3
0  1  3  2
1  2  2  1
2  3  1  3
   Lineup    1    2    3
0       1  100   25   50
1       2   50   50  100
2       3   25  100   25

The lambda function that is just above the print statements, is there any way to write that more efficiently. This is just a small example of what I'm using it for inside a large loop. This one portion of the loop takes roughly about half of the time of the entire loop. Any help would be appreciated.

CodePudding user response：

You don't need to create a dict for mapper, setting the index and ensuring it is a Series suffices (a Series is a dict in a way); on to your question, you can use replace instead; it should be faster:

mapper = prizes.set_index('Rank')['Payout']

pd.concat([payouts, ranking.replace(mapper)], axis=1)

   Lineup    1    2    3
0       1  100   25   50
1       2   50   50  100
2       3   25  100   25

Your example doesn't show the need for a fillna; you can add extra details to your data for such a scenario. Also, since payouts is just a single column, you could instead create a Series, some performance gain may be had from there

CodePudding user response：

Here is an even faster (but less concise) solution using the underlying numpy array. There is a ~1.7x gain compared to replace.

a = prizes.set_index('Rank')['Payout'].values
b = ranking.values-1 # get index as 0/1/2
c = a.take(b.flatten()).reshape(b.shape) # index in 1D and reshape to 2D
pd.DataFrame(c, columns=ranking.columns)

NB. I broke the steps down for clarity, but this could be done without the intermediate variables

Output:

     1    2    3
0  100   25   50
1   50   50  100
2   25  100   25