Home > Mobile >  How can I merge these two dataframes correctly?
How can I merge these two dataframes correctly?

Time:01-02

there, I have two data frames like in the following table

First is df_zero_purchase: Includes around 4700 rows.

OrderItemSKU PurchasePrice TotalWeight
4188-DE 0.0 2.5
5300-MY 0.0 3.8
1889-XC 0.0 4.7
df_zero_purchase =  pd.DataFrame({
  "OrderItemSKU": ['4188-DE', '5300-MY', '1889-XC'],
  "PurchasePrice": [0, 0, 0],
  "TotalWeight":[2.5, 3.8, 4.5]
})

And the second is df_purchase: Includes 4814 rows.

OrderItemSKU PurchasePrice
4188-DE 5.5
5300-MY 8.3
1889-XC 2.1
df_purchase = pd.DataFrame({
  "OrderItemSKU": ['4188-DE', '5300-MY', '1889-XC'],
  "PurchasePrice": [5.5, 8.3, 2.1],
})

I just wanted to update the zero PurchasePrices on my first data frame .I tried the following code but at the and it gives as shape with almost 50000 rows. I don't understand why ? So I need your help...

df_merged = pd.merge(df_zero_purchase, df_purchase[['OrderItemSKU', 'PurchasePrice']], on='ORDERITEMSKU')

CodePudding user response:

A possible solution, which is based on pandas.DataFrame.update, where df1 and df2 are, respectively, the first and the second dataframes:

df1['PurchasePrice'].update(df2['PurchasePrice'])

In case the dataframes are not in the same order, we can use pandas.DataFrame.merge:

pd.merge(df1, df2, on='OrderItemSKU', suffixes=('_','')).iloc[:, [0,3,2]]

Or again pandas.DataFrame.update, but setting OrderItemSKU as index in both dataframes:

df1 = df1.set_index('OrderItemSKU')
df1.update(df2.set_index('OrderItemSKU'))
df1 = df1.reset_index()

Output:

  OrderItemSKU  PurchasePrice  TotalWeight
0      4188-DE            5.5          2.5
1      5300-MY            8.3          3.8
2      1889-XC            2.1          4.7

CodePudding user response:

You must have duplicate values for "OrderItemSKU" column. Please remove duplicate from both dataframe. and then try to merge.

Use df[df.duplicated('OrderItemSKU')], for both dataframe. If you want to remove duplicate use - new_df = df[df.duplicated(keep = 'first')]

  • Related