there, I have two data frames like in the following table
First is df_zero_purchase: Includes around 4700 rows.
OrderItemSKU | PurchasePrice | TotalWeight |
---|---|---|
4188-DE | 0.0 | 2.5 |
5300-MY | 0.0 | 3.8 |
1889-XC | 0.0 | 4.7 |
df_zero_purchase = pd.DataFrame({
"OrderItemSKU": ['4188-DE', '5300-MY', '1889-XC'],
"PurchasePrice": [0, 0, 0],
"TotalWeight":[2.5, 3.8, 4.5]
})
And the second is df_purchase: Includes 4814 rows.
OrderItemSKU | PurchasePrice |
---|---|
4188-DE | 5.5 |
5300-MY | 8.3 |
1889-XC | 2.1 |
df_purchase = pd.DataFrame({
"OrderItemSKU": ['4188-DE', '5300-MY', '1889-XC'],
"PurchasePrice": [5.5, 8.3, 2.1],
})
I just wanted to update the zero PurchasePrices on my first data frame .I tried the following code but at the and it gives as shape with almost 50000 rows. I don't understand why ? So I need your help...
df_merged = pd.merge(df_zero_purchase, df_purchase[['OrderItemSKU', 'PurchasePrice']], on='ORDERITEMSKU')
CodePudding user response:
A possible solution, which is based on pandas.DataFrame.update
, where df1
and df2
are, respectively, the first and the second dataframes:
df1['PurchasePrice'].update(df2['PurchasePrice'])
In case the dataframes are not in the same order, we can use pandas.DataFrame.merge
:
pd.merge(df1, df2, on='OrderItemSKU', suffixes=('_','')).iloc[:, [0,3,2]]
Or again pandas.DataFrame.update
, but setting OrderItemSKU
as index in both dataframes:
df1 = df1.set_index('OrderItemSKU')
df1.update(df2.set_index('OrderItemSKU'))
df1 = df1.reset_index()
Output:
OrderItemSKU PurchasePrice TotalWeight
0 4188-DE 5.5 2.5
1 5300-MY 8.3 3.8
2 1889-XC 2.1 4.7
CodePudding user response:
You must have duplicate values for "OrderItemSKU" column. Please remove duplicate from both dataframe. and then try to merge.
Use df[df.duplicated('OrderItemSKU')], for both dataframe. If you want to remove duplicate use - new_df = df[df.duplicated(keep = 'first')]