df1:
**Tarif von bis GK**
FedEx 0.0 1.0 G001
FedEx 1.0 2.0 G002
...
DHL. 0.0 0.5 G001
DHL. 0.5 1.0 G002
...
DPD 0.0 5.0 G001
DPD 5.0 10.0 G002
df2:
**Tarif Weight GK**
FedEx 0.6
DHL 0.6
FedEx 0.5
DPD 7.5
My attempt:
for i in range(len(df2)):
df2.loc[[i]['GK'] = df1['GK'].loc[(df1['Tarif'] == df2.loc[[i]]['Tarif'])
& (df1['von'] < df2[[i]]['Weight'])
& (df1['bis'] >= df2[[i]]['Weight'])]
ValueError: Can only compare identically-labeled Series objects*
Result should be
df2:
**Tarif Weight GK****
FedEx 0.6. G001
DHL 0.6. G002
FedEx 0.5. G001
DPD 3.5. G002
CodePudding user response:
Another possible solution, which is based on the following ideas:
Merge the two dataframes as usual with
pandas.DataFrame.merge
.Filter out the cases that do not satisfy the conditions.
out = df2.iloc[:,:2].merge(df1, on='Tarif')
out = out.loc[out['von'].lt(out['Weight']) & out['bis'].ge(out['Weight'])]
out = out.reset_index(drop=True)
Output:
Tarif Weight von bis GK
0 FedEx 0.6 0.0 1.0 G001
1 FedEx 0.5 0.0 1.0 G001
2 DHL 0.6 0.5 1.0 G002
3 DPD 7.5 5.0 10.0 G002
CodePudding user response:
Use a merge_asof
:
(pd.merge_asof(df2.reset_index().drop(columns='GK', errors='ignore')
.sort_values(by='Weight'),
df1.sort_values(by='von'),
left_on='Weight', right_on='von', by='Tarif'
)
.set_index('index')
# the line below is only necessary if the bins are disjoint
# or if there is a risk that the Weight is greater than the max "bis"
.assign(GK=lambda d: d['GK'].mask(d['Weight'].gt(d['bis'])))
.sort_index()
#.drop(columns=['von', 'bis']) # uncomment to remove von/bis
)
Output:
Tarif Weight von bis GK
index
0 FedEx 0.6 0.0 1.0 G001
1 DHL 0.6 0.5 1.0 G002
2 FedEx 0.5 0.0 1.0 G001
3 DPD 7.5 5.0 10.0 G002