I'm trying to solve an interesting problem and would like any suggestions.
What I'm trying to do is merge two dataframes in three columns but if the third one in the first dataframe has a nan value then only merge the first two.
Example:
---DataFrame 1---
Number | Number2 | Name |
---|---|---|
1 | 2 | One |
2 | 2 | |
3 | 2 | Three |
---DataFrame 2---
Number | Number2 | Name2 |
---|---|---|
1 | 2 | One |
2 | 2 | Two |
2 | 2 | Two.5 |
3 | 2 | Three |
3 | 2 | Three.5 |
4 | 2 | Four |
---Result---
Number | Number2 | Name | Name2 |
---|---|---|---|
1 | 2 | One | One |
2 | 2 | Two | |
2 | 2 | Two.5 | |
3 | 2 | Three | Three |
So far I tried to do a function for this.
def merge_three_or_two(row):
if row['Name'] == np.nan:
row = pd.merge(row, df2, how='left', left_on=['Number','Number2'], right_on = ['Number','Number2'])
else:
row = pd.merge(row, df2, how='left', left_on=['Number','Number2','Name'], right_on = ['Number','Number2','Name2'])
df1 = df1.apply(merge_three_or_two, axis=1)
CodePudding user response:
Try to use .isna().any() in the condition:
if df1.Name.isna().any():
print(df1.merge(df2, how='left', on=['Number', 'Number2']))
else:
print(df1.merge(df2, how='left', left_on=['Number','Number2','Name'], right_on = ['Number','Number2','Name2']))
CodePudding user response:
You can merge df1
and df2
dataframes on 'Number','Number2'
columns on 1st phase/step, then just drop rows that match the additional condition:
df3 = df1.merge(df2, how='left', left_on=['Number','Number2'], right_on=['Number','Number2'])
df3.drop(df3[df3['Name'].notna() & (df3['Name'] != df3['Name2'])].index, inplace=True)
print(df3)
Number Number2 Name Name2
0 1 2 One One
1 2 2 NaN Two
2 2 2 NaN Two.5
3 3 2 Three Three