I am trying to merge two dataframes together to replace possible missing values. For example:
df1:
Fruit Price
Apple 5
Banana 7
Orange 4
df2:
Fruit Price
Pear 3
result:
Fruit Price
Apple 5
Banana 7
Orange 4
Pear 3
I've used the concat method to do this and it works, however when both dataframes have the same column instead of having both duplicate values I want to use the data from the first df and ignore the second df. For example:
df1:
Fruit Price
Apple 5
Banana 7
Orange 4
Pear 6
df2:
Fruit Price
Pear 3
result:
Fruit Price
Apple 5
Banana 7
Orange 4
Pear 6
How can I do this so it works on both examples?
CodePudding user response:
You could use concat
and chain drop_duplicates
on your Fruit column. Just always make sure to put the dataframe you want the values to keep in case of duplicated as the first parameter in concat
:
# Case 1
>>> pd.concat([df1b,df2]).drop_duplicates('Fruit')
Fruit Price
0 Apple 5
1 Banana 7
2 Orange 4
0 Pear 3
# Case 2
>>> pd.concat([df1,df2]).drop_duplicates('Fruit')
Fruit Price
0 Apple 5
1 Banana 7
2 Orange 4
3 Pear 6
Setup
df1 = pd.DataFrame({'Fruit':['Apple','Banana','Orange','Pear'],
'Price':[5,7,4,6]})
df1b = pd.DataFrame({'Fruit':['Apple','Banana','Orange'],
'Price':[5,7,4]})
df2 = pd.DataFrame({'Fruit':['Pear'],
'Price':[3]})
CodePudding user response:
here is one way to do it: merge df1 and df2 and keep only the ones where there is no match and the concatenate with the first one
When the Fruit already exists, the second will be empty and won't be merged
pd.concat([df1,
df2.merge(df1, on='Fruit', how='left', suffixes=('','_y')).query('Price_y.isnull()').drop(columns='Price_y')]
)
Fruit Price
0 Apple 5
1 Banana 7
2 Orange 4
0 Pear 3