I have this 2 dataframes.
import pandas as pd
data1 = {
'Product': ['product1', 'product2', 'product3'],
'Price': [200, 300, 400],
'Quantity': [10, 5, 20],
}
df1 = pd.DataFrame(data1, columns= ['Product','Price','Quantity'])
print(df1)
data2 = {
'Product': ['product1','product2','product4'],
'Price': [200, 1000,50],
}
df2 = pd.DataFrame(data2, columns= ['Product','Price'])
df1:
Product Price Quantity
0 product1 200 10
1 product2 300 5
2 product3 400 20
df2:
Product Price
0 product1 200
1 product2 1000
2 product4 50
I search for concatening and updating both to obtain this dataframe:
Product Price Quantity
0 product1 200 10
1 product2 1000 5
2 product3 -1 20
2 product4 50 NaN
This means that:
- New product in df2 (product4) has to be added with the available information (Price)
- Product which is not in df2 should be kept with Price set to -1
- Product in df1 and df2 has to only has his price updated (product2)
- All other Products are kept the same.
Thank you for your help.
CodePudding user response:
Here's a merge based solution:
- Get the
Quantity
value fromdf1
by merging the two DataFrames on theProduct
. ThePrice
fromdf1
is dropped so that those prices aren't added to the final DataFrame. This is an outer merge to ensure the result has products from bothdf1
anddf2
. - The above step almost gets you to the desired result, except for needing to replace missing prices with
-1
and sorting based on product.
final = df2.merge(df1.drop('Price', axis=1), on='Product', how='outer')
final['Price'].fillna(-1, inplace=True)
final.sort_values('Product', inplace=True)
Result:
Product Price Quantity
0 product1 200.0 10.0
1 product2 1000.0 5.0
3 product3 -1.0 20.0
2 product4 50.0 NaN