Home > database >  How to lookup a specific value in a range of one DataFrame an put in in another
How to lookup a specific value in a range of one DataFrame an put in in another

Time:11-24

df1:

**Tarif    von    bis   GK**
FedEx    0.0    1.0  G001
FedEx    1.0    2.0  G002
...
DHL.     0.0    0.5  G001
DHL.     0.5    1.0  G002
...
DPD      0.0    5.0  G001
DPD      5.0    10.0 G002

df2:

**Tarif   Weight  GK**

FedEx     0.6
DHL       0.6
FedEx     0.5
DPD       7.5

My attempt:

for i in range(len(df2)):
      
        df2.loc[[i]['GK'] = df1['GK'].loc[(df1['Tarif'] == df2.loc[[i]]['Tarif'])
                                & (df1['von'] <  df2[[i]]['Weight']) 
                                & (df1['bis'] >= df2[[i]]['Weight'])]
ValueError: Can only compare identically-labeled Series objects*

Result should be

df2:

**Tarif   Weight  GK****

FedEx     0.6.    G001
DHL       0.6.    G002
FedEx     0.5.    G001
DPD       3.5.    G002

CodePudding user response:

Another possible solution, which is based on the following ideas:

  1. Merge the two dataframes as usual with pandas.DataFrame.merge.

  2. Filter out the cases that do not satisfy the conditions.

out = df2.iloc[:,:2].merge(df1, on='Tarif')
out = out.loc[out['von'].lt(out['Weight']) & out['bis'].ge(out['Weight'])]
out = out.reset_index(drop=True)

Output:

   Tarif  Weight  von   bis    GK
0  FedEx     0.6  0.0   1.0  G001
1  FedEx     0.5  0.0   1.0  G001
2    DHL     0.6  0.5   1.0  G002
3    DPD     7.5  5.0  10.0  G002

CodePudding user response:

Use a merge_asof:

(pd.merge_asof(df2.reset_index().drop(columns='GK', errors='ignore')
                  .sort_values(by='Weight'),
               df1.sort_values(by='von'),
               left_on='Weight', right_on='von', by='Tarif'
              )
   .set_index('index')
   # the line below is only necessary if the bins are disjoint
   # or if there is a risk that the Weight is greater than the max "bis"
   .assign(GK=lambda d: d['GK'].mask(d['Weight'].gt(d['bis'])))
   .sort_index()
   #.drop(columns=['von', 'bis']) # uncomment to remove von/bis
)

Output:

       Tarif  Weight  von   bis    GK
index                                
0      FedEx     0.6  0.0   1.0  G001
1        DHL     0.6  0.5   1.0  G002
2      FedEx     0.5  0.0   1.0  G001
3        DPD     7.5  5.0  10.0  G002
  • Related