Home > OS >  More efficient way to write this lambda function
More efficient way to write this lambda function

Time:09-17

import pandas as pd

prizes = ([1, 100], [2, 50], [3, 25])
prizes = pd.DataFrame(prizes, columns=['Rank', 'Payout'])

ranking = ([1, 3, 2], [2, 2, 1], [3, 1, 3])
ranking = pd.DataFrame(ranking, columns=[1, 2, 3])

payouts = pd.DataFrame(range(1, 4), columns=['Lineup'])
mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat([payouts, ranking[range(1, 4)].apply(lambda s: s.map(mapper)).fillna(-1)], axis=1)

print(ranking)
print(payouts)

   1  2  3
0  1  3  2
1  2  2  1
2  3  1  3
   Lineup    1    2    3
0       1  100   25   50
1       2   50   50  100
2       3   25  100   25

The lambda function that is just above the print statements, is there any way to write that more efficiently. This is just a small example of what I'm using it for inside a large loop. This one portion of the loop takes roughly about half of the time of the entire loop. Any help would be appreciated.

CodePudding user response:

You don't need to create a dict for mapper, setting the index and ensuring it is a Series suffices (a Series is a dict in a way); on to your question, you can use replace instead; it should be faster:

mapper = prizes.set_index('Rank')['Payout']

pd.concat([payouts, ranking.replace(mapper)], axis=1)

   Lineup    1    2    3
0       1  100   25   50
1       2   50   50  100
2       3   25  100   25

Your example doesn't show the need for a fillna; you can add extra details to your data for such a scenario. Also, since payouts is just a single column, you could instead create a Series, some performance gain may be had from there

CodePudding user response:

Here is an even faster (but less concise) solution using the underlying numpy array. There is a ~1.7x gain compared to replace.

a = prizes.set_index('Rank')['Payout'].values
b = ranking.values-1 # get index as 0/1/2
c = a.take(b.flatten()).reshape(b.shape) # index in 1D and reshape to 2D
pd.DataFrame(c, columns=ranking.columns)

NB. I broke the steps down for clarity, but this could be done without the intermediate variables

Output:

     1    2    3
0  100   25   50
1   50   50  100
2   25  100   25
  • Related