Replace values in one dataframe with values from another dataframe-CodePudding

I have two dataframes and I am trying to replace all values of 0 from the first dataframe with certain values from the second dataframe.

The first dataframe looks like this:

Region	Country	Product	Quantity	Price
Africa	South Africa	ABC	500	1200
Africa	South Africa	DEF	200	400
Africa	South Africa	XYZ	0	0
Africa	Nigeria	DEF	150	450
Africa	Nigeria	XYZ	200	750
Asia	Japan	XYZ	100	500
Asia	Japan	ABC	200	500
Asia	Japan	DEF	0	0
Asia	India	XYZ	250	600
Asia	India	ABC	100	400
Asia	India	DEF	40	220

It contains quantities and total price for each type of product for each country. Some country and product combinations have values of 0.

The second dataframe looks like this:

Region	Product	Quantity	Price
Africa	ABC	375	825
Africa	DEF	125	250
Africa	XYZ	110	300
Asia	XYZ	200	500
Asia	ABC	200	500
Asia	DEF	120	300

The second dataframe contains the region averages for quantity and price across each product.

The task is to replace all 0 values in dataframe 1 with the averages for that region and product from dataframe 2.

For example, the 0 values for South Africa for product XYZ in dataframe 1 should be replaced with the values for product XYZ for Africa in dataframe 2. The other values for South Africa should stay the same and so on.

Any tips on how to do this?

CodePudding user response：

First, get all average values merged into the first df, a joining is suffice.

df1_merged = df1.merge(df2, how='left', on=['Region', 'Product'], suffixes=('','_Average'))

we get df1_merged as

	Region	Country	Product	Quantity	Price	Quantity_Average	Price_Average
0	Africa	South Africa	ABC	500	1200	375	825
1	Africa	South Africa	DEF	200	400	125	250
2	Africa	South Africa	XYZ	0	0	110	300
3	Africa	Nigeria	DEF	150	450	125	250
4	Africa	Nigeria	XYZ	200	750	110	300
5	Asia	Japan	XYZ	100	500	200	500
6	Asia	Japan	ABC	200	500	200	500
7	Asia	Japan	DEF	0	0	120	300
8	Asia	India	XYZ	250	600	200	500
9	Asia	India	ABC	100	400	200	500
10	Asia	India	DEF	40	220	120	300

Then we can fillin these zero with Average columns with conditional loc

df1_merged.loc[df1_merged['Quantity'] == 0, 'Quantity'] = \
df1_merged['Quantity_Average']

df1_merged.loc[df1_merged['Price'] == 0, 'Price'] = \
df1_merged['Price_Average']

now we get df1_merged as

	Region	Country	Product	Quantity	Price	Quantity_Average	Price_Average
0	Africa	South Africa	ABC	500	1200	375	825
1	Africa	South Africa	DEF	200	400	125	250
2	Africa	South Africa	XYZ	110	300	110	300
3	Africa	Nigeria	DEF	150	450	125	250
4	Africa	Nigeria	XYZ	200	750	110	300
5	Asia	Japan	XYZ	100	500	200	500
6	Asia	Japan	ABC	200	500	200	500
7	Asia	Japan	DEF	120	300	120	300
8	Asia	India	XYZ	250	600	200	500
9	Asia	India	ABC	100	400	200	500
10	Asia	India	DEF	40	220	120	300

Finally, drop unneeded columns with

df1_merged = df1_merged.drop(['Quantity_Average', 'Price_Average'], axis=1)

and we get

	Region	Country	Product	Quantity	Price
0	Africa	South Africa	ABC	500	1200
1	Africa	South Africa	DEF	200	400
2	Africa	South Africa	XYZ	110	300
3	Africa	Nigeria	DEF	150	450
4	Africa	Nigeria	XYZ	200	750
5	Asia	Japan	XYZ	100	500
6	Asia	Japan	ABC	200	500
7	Asia	Japan	DEF	120	300
8	Asia	India	XYZ	250	600
9	Asia	India	ABC	100	400
10	Asia	India	DEF	40	220

CodePudding user response：

You can use update after replacing 0 with np.nan and setting a common index between the two dataframes.

Be wary of two things:

Use overwrite=False to only fill the null values
update modifies inplace

common_index = ['Region','Product']
df_indexed = df.replace(0,np.nan).set_index(common_index)
df2_indexed = df2.set_index(common_index)

df_indexed.update(df2_indexed,overwrite=False)

print(df_indexed.reset_index())

    Region Product       Country  Quantity   Price
0   Africa     ABC  South Africa     500.0  1200.0
1   Africa     DEF  South Africa     200.0   400.0
2   Africa     XYZ  South Africa     110.0   300.0
3   Africa     DEF       Nigeria     150.0   450.0
4   Africa     XYZ       Nigeria     200.0   750.0
5     Asia     XYZ         Japan     100.0   500.0
6     Asia     ABC         Japan     200.0   500.0
7     Asia     DEF         Japan     120.0   300.0
8     Asia     XYZ         India     250.0   600.0
9     Asia     ABC         India     100.0   400.0
10    Asia     DEF         India      40.0   220.0