I want to join 2 tables the first one is
SCORE_BIN NAME GROUP
(0,0.5] A LOW
(0.5,1] A HIGH
(0,0.2] B LOW
(0.2,1] B HIGH
The second one is
SCORE CUST_NM
0.1 A
0.8 A
0.9 B
0.1 B
My expected output is
SCORE CUST_NM GROUP
0.1 A LOW
0.8 A HIGH
0.9 B HIGH
0.1 B LOW
ps. "SCORE_BIN" column type is interval that getting from pd.qcut function.
CodePudding user response:
You can reuse bins from qcut
to another DataFrame and then use merge
:
df1["SCORE_BIN"] = pd.qcut(df1["SCORE"], 2)
##https://stackoverflow.com/a/37906403/2901002
ser, bins = pd.qcut(df1["SCORE"], 2, retbins=True)
print(bins)
df2['SCORE_BIN'] = pd.cut(df2["SCORE"], bins=bins, include_lowest=True)
EDIT: If is necessary use qcut
per groups use:
print (df1)
SCORE_BIN NAME GROUP SCORE
0 None A LOW 0.0
1 None A LOW 0.5
2 None A HIGH 1.0
3 None B LOW 0.0
4 None B LOW 0.2
5 None B HIGH 1.0
print (df2)
SCORE CUST_NM
0 0.1 A
1 0.8 A
2 0.9 B
3 0.1 B
d = {}
def f(x):
#https://stackoverflow.com/a/37906403/2901002
ser, bins = pd.qcut(x, 2, retbins=True)
d[x.name] = bins
return ser
df1["SCORE_BIN"] = df1.groupby('NAME')["SCORE"].transform(f)
print(df1)
SCORE_BIN NAME GROUP SCORE
0 (-0.001, 0.5] A LOW 0.0
1 (-0.001, 0.5] A LOW 0.5
2 (0.5, 1.0] A HIGH 1.0
3 (-0.001, 0.2] B LOW 0.0
4 (-0.001, 0.2] B LOW 0.2
5 (0.2, 1.0] B HIGH 1.0
print (d)
{'A': array([0. , 0.5, 1. ]), 'B': array([0. , 0.2, 1. ])}
f = lambda x: pd.cut(x, bins=d[x.name], include_lowest=True)
df2["SCORE_BIN"] = df2.groupby('CUST_NM')["SCORE"].transform(f)
print(df2)
SCORE CUST_NM SCORE_BIN
0 0.1 A (-0.001, 0.5]
1 0.8 A (0.5, 1.0]
2 0.9 B (0.2, 1.0]
3 0.1 B (-0.001, 0.2]
df = df2.merge(df1, how='left',
left_on=['SCORE_BIN','CUST_NM'],
right_on=['SCORE_BIN','NAME'])
print (df)