Home > Blockchain >  How to join table in pandas with multiple interval conditions?
How to join table in pandas with multiple interval conditions?

Time:10-29

I want to join 2 tables the first one is

SCORE_BIN   NAME  GROUP
(0,0.5]      A     LOW
(0.5,1]      A     HIGH
(0,0.2]      B     LOW
(0.2,1]      B     HIGH

The second one is

SCORE   CUST_NM
0.1      A      
0.8      A      
0.9      B      
0.1      B      

My expected output is

SCORE   CUST_NM  GROUP
0.1      A        LOW
0.8      A        HIGH
0.9      B        HIGH
0.1      B        LOW

ps. "SCORE_BIN" column type is interval that getting from pd.qcut function.

CodePudding user response:

You can reuse bins from qcut to another DataFrame and then use merge:

df1["SCORE_BIN"] = pd.qcut(df1["SCORE"], 2)

##https://stackoverflow.com/a/37906403/2901002
ser, bins = pd.qcut(df1["SCORE"], 2, retbins=True)
print(bins)

df2['SCORE_BIN'] = pd.cut(df2["SCORE"], bins=bins, include_lowest=True)

EDIT: If is necessary use qcut per groups use:

print (df1)
  SCORE_BIN NAME GROUP  SCORE
0      None    A   LOW    0.0
1      None    A   LOW    0.5
2      None    A  HIGH    1.0
3      None    B   LOW    0.0
4      None    B   LOW    0.2
5      None    B  HIGH    1.0

print (df2)
  SCORE CUST_NM
0    0.1       A
1    0.8       A
2    0.9       B
3    0.1       B

d = {}
def f(x):
    #https://stackoverflow.com/a/37906403/2901002
    ser, bins = pd.qcut(x, 2, retbins=True)
    d[x.name] = bins
    return ser
    
    
df1["SCORE_BIN"] = df1.groupby('NAME')["SCORE"].transform(f)
print(df1)
       SCORE_BIN NAME GROUP  SCORE
0  (-0.001, 0.5]    A   LOW    0.0
1  (-0.001, 0.5]    A   LOW    0.5
2     (0.5, 1.0]    A  HIGH    1.0
3  (-0.001, 0.2]    B   LOW    0.0
4  (-0.001, 0.2]    B   LOW    0.2
5     (0.2, 1.0]    B  HIGH    1.0
       
print (d)
{'A': array([0. , 0.5, 1. ]), 'B': array([0. , 0.2, 1. ])}

f = lambda x: pd.cut(x, bins=d[x.name], include_lowest=True)
df2["SCORE_BIN"] = df2.groupby('CUST_NM')["SCORE"].transform(f)
print(df2)
   SCORE CUST_NM      SCORE_BIN
0    0.1       A  (-0.001, 0.5]
1    0.8       A     (0.5, 1.0]
2    0.9       B     (0.2, 1.0]
3    0.1       B  (-0.001, 0.2]

df = df2.merge(df1, how='left', 
               left_on=['SCORE_BIN','CUST_NM'], 
                right_on=['SCORE_BIN','NAME'])
print (df)
  • Related