Home > Blockchain >  Function with if case to merge two columns or three in pandas
Function with if case to merge two columns or three in pandas

Time:01-04

I'm trying to solve an interesting problem and would like any suggestions.

What I'm trying to do is merge two dataframes in three columns but if the third one in the first dataframe has a nan value then only merge the first two.

Example:

---DataFrame 1---

Number Number2 Name
1 2 One
2 2
3 2 Three

---DataFrame 2---

Number Number2 Name2
1 2 One
2 2 Two
2 2 Two.5
3 2 Three
3 2 Three.5
4 2 Four

---Result---

Number Number2 Name Name2
1 2 One One
2 2 Two
2 2 Two.5
3 2 Three Three

So far I tried to do a function for this.

def merge_three_or_two(row):
        if row['Name'] == np.nan:
            row = pd.merge(row, df2,  how='left', left_on=['Number','Number2'], right_on = ['Number','Number2'])
        else:
            row = pd.merge(row, df2,  how='left', left_on=['Number','Number2','Name'], right_on = ['Number','Number2','Name2'])
    
df1 = df1.apply(merge_three_or_two, axis=1)

CodePudding user response:

Try to use .isna().any() in the condition:

if df1.Name.isna().any():
    print(df1.merge(df2, how='left', on=['Number', 'Number2']))
else:
    print(df1.merge(df2, how='left', left_on=['Number','Number2','Name'], right_on = ['Number','Number2','Name2']))

CodePudding user response:

You can merge df1 and df2 dataframes on 'Number','Number2' columns on 1st phase/step, then just drop rows that match the additional condition:

df3 = df1.merge(df2, how='left', left_on=['Number','Number2'], right_on=['Number','Number2'])
df3.drop(df3[df3['Name'].notna() & (df3['Name'] != df3['Name2'])].index, inplace=True)
print(df3)

  Number  Number2   Name  Name2
0       1        2    One    One
1       2        2    NaN    Two
2       2        2    NaN  Two.5
3       3        2  Three  Three
  • Related