Home > Software design >  How to lookup in python between 2 dataframes with match mode -> an exact match or the next larger
How to lookup in python between 2 dataframes with match mode -> an exact match or the next larger

Time:12-03

I'd like to create a lookup (similar to excel for example) with match mode -> an exact match or the next larger item.

Let's say I have these 2 dataframes:

seed(1)
np.random.seed(1) 
Wins_Range = np.arange(1,101,1)
Wins = pd.DataFrame({"Wins Needed": Wins_Range})

Wins


    Wins Needed
0   1
1   2
2   3
3   4
4   5
... ...
95  96
96  97
97  98
98  99
99  100

And the second one:

Levels_Range = np.arange(1,101,1)
Levels = pd.DataFrame({"Level": Levels_Range})
Levels["Wins"]=np.random.choice([1,2,3,4,5],size=len(Levels), p=[0.2,0.2,0.2,0.2,0.2]).cumsum()

Levels

    Level   Wins
0   1   3
1   2   7
2   3   8
3   4   10
4   5   11
... ... ...
95  96  281
96  97  286
97  98  289
98  99  290
99  100 294

Now, I'd like to pull the level from Levels df to the Wins df when the condition is Wins Needed=Wins but as I said - the match mode will be an exact match or the next larger item. BTW - the type of Levels["Wins"] is float and the type of Wins["Win"] is int if that matters.

I've tried to use the merge function but it doesn't work (I'm new at python) -

Wins.merge(Levels, on='Wins Needed', how='left')

Thanks in advance!

CodePudding user response:

You need a merge_asof:

out = pd.merge_asof(Wins, Levels, left_on='Wins Needed', right_on='Wins',
                    direction='forward')[['Wins Needed', 'Level']]

Or

Wins['Level'] = pd.merge_asof(Wins, Levels, left_on='Wins Needed', right_on='Wins',
                              direction='forward')['Level']

NB. the keys must be sorted for a merge_asof.

Output:

    Wins Needed  Level
0             1      1
1             2      1
2             3      1
3             4      2
4             5      2
..          ...    ...
95           96     35
96           97     35
97           98     36
98           99     36
99          100     37

[100 rows x 2 columns]

If the values are not initially sorted:

Wins['Level'] = pd.merge_asof(Wins[['Wins Needed']].reset_index().sort_values(by='Wins Needed'),
                              Levels.sort_values(by='Wins'),
                              left_on='Wins Needed', right_on='Wins',
                              direction='forward').set_index('index')['Level']
  • Related