I have two different size tables with approximately 15000 rows each, both on the same excel sheet and following the same template.
I need to align the costs with same Product ID, Region and Country and get rid of the duplicate info, so that it looks like this.
I already separated the file in 2 pandas dataframes and tried np.where
and isin
to align the rows, but that way I only managed to align them using one column, and for this case I need 3.
Is there any pandas way to do this?
CodePudding user response:
Please try this (very similar to other answer, but used merge
):
df1 = pd.DataFrame({'ProductID': [432, 653, 438,432, 567, 438,432],
'Region': ['SA', 'BR', 'EU', 'NA', 'BR', 'NA', 'SA'],
'Country': ['Columbia', 'Brazil', 'Spain', 'USA', 'Brazil', 'Canada', 'Chile'],
'Cost Wk1': [400, 4200, 290, 390, 800, 290, 425]})
df2 = pd.DataFrame({'ProductID': [438, 432, 432, 432, 653, 567, 438],
'Region': ['NA', 'NA', 'SA', 'SA', 'BR', 'BR', 'EU'],
'Country': ['Canada','USA','Columbia','Chile','Brazil', 'Brazil', 'Spain'],
'Cost Wk2': [300, 500, 410, 430, 4100, 820, 200]})
df3 = pd.merge(df1, df2, how = 'left', on = ['ProductID', 'Region', 'Country'])
print(df3)
ProductID Region Country Cost Wk1 Cost Wk2
0 432 SA Columbia 400 410
1 653 BR Brazil 4200 4100
2 438 EU Spain 290 200
3 432 NA USA 390 500
4 567 BR Brazil 800 820
5 438 NA Canada 290 300
6 432 SA Chile 425 430
CodePudding user response:
I believe this will do what your question asks:
df1 = df1.join(df2.set_index(['Product ID', 'Region', 'Country']), on=['Product ID', 'Region', 'Country'])
Input:
df1:
Product ID Region Country Cost Wk1
0 432 SA Colombia 400
1 653 BR Brazil 4200
2 438 EU Spain 290
3 432 NA USA 390
4 567 BR Brazil 800
5 438 NA Canada 290
6 432 SA Chile 425
df2:
Product ID Region Country Cost Wk2
0 438 NA Canada 300
1 432 NA USA 500
2 432 SA Colombia 410
3 432 SA Chile 430
4 653 BR Brazil 4100
5 567 BR Brazil 820
6 438 EU Spain 200
result:
Product ID Region Country Cost Wk1 Cost Wk2
0 432 SA Colombia 400 410
1 653 BR Brazil 4200 4100
2 438 EU Spain 290 200
3 432 NA USA 390 500
4 567 BR Brazil 800 820
5 438 NA Canada 290 300
6 432 SA Chile 425 430
Note that my result differs from the one in your question (4200 and 800 are swapped for Brazil), but I think it's correct.