Home > Mobile >  How to merge pandas table by regex on multiple columns
How to merge pandas table by regex on multiple columns

Time:09-22

Given the following data frames join\merge df1 and df2 to result in df3 in the following manner: df1.a join on regex matching to df2.a AND df1.b join on regex matching to df2.b

df1 = pd.DataFrame({'a': ['foo', 'bar','zoolo','foo','foo'], 'b': ['tty','abc', 'uhg','abc','tty'], 'c':[14,72,93,33,57]})
a b c
0 foo tty 14
1 bar abc 72
2 zoolo uhg 93
3 foo abc 33
4 foo tty 57
df2 = pd.DataFrame({'a': ['zool(o|r)','foo|oof','[badr] '],'b': ['uh(f|g)','[ty] ','abc|bac'], 'j': [11,32,65]})
a b j
0 zool(o|r) uh(f|g) 11
1 foo|oof [ty] 32
2 [badr] abc|bac 65

Result

df3 = pd.DataFrame({'a': ['foo', 'bar','zoolo','foo'], 'b': ['tty','abc', 'uhg','tty'], 'c':[14,72,93,57],'j': [32,65,11,32]}, )
a b c j
0 foo tty 14 32
1 bar abc 72 65
2 zoolo uhg 93 11
4 foo tty 57 32

CodePudding user response:

def func(dfRow):
    for index, row in df2.iterrows():
        if re.match(row["a"], dfRow["a"]) and re.match(row["b"], dfRow["b"]):
            return dfRow.append(row[["j"]])

df1.apply(func, axis=1).dropna()
  • Related