Home > Software engineering >  Align rows with same column values - Pandas
Align rows with same column values - Pandas

Time:06-29

I have two different size tables with approximately 15000 rows each, both on the same excel sheet and following the same template.

template

I need to align the costs with same Product ID, Region and Country and get rid of the duplicate info, so that it looks like this.

this

I already separated the file in 2 pandas dataframes and tried np.where and isin to align the rows, but that way I only managed to align them using one column, and for this case I need 3. Is there any pandas way to do this?

CodePudding user response:

Please try this (very similar to other answer, but used merge):

df1 = pd.DataFrame({'ProductID': [432, 653, 438,432, 567, 438,432],
                   'Region': ['SA', 'BR', 'EU', 'NA', 'BR', 'NA', 'SA'],
                   'Country': ['Columbia', 'Brazil', 'Spain', 'USA', 'Brazil', 'Canada', 'Chile'],
                   'Cost Wk1': [400, 4200, 290, 390, 800, 290, 425]})

df2 = pd.DataFrame({'ProductID': [438, 432, 432, 432, 653, 567, 438],
                   'Region': ['NA', 'NA', 'SA', 'SA', 'BR', 'BR', 'EU'],
                   'Country': ['Canada','USA','Columbia','Chile','Brazil', 'Brazil', 'Spain'],
                   'Cost Wk2': [300, 500, 410, 430, 4100, 820, 200]})

df3 = pd.merge(df1, df2, how = 'left', on = ['ProductID', 'Region', 'Country'])

 print(df3)

ProductID Region   Country  Cost Wk1  Cost Wk2
0        432     SA  Columbia       400       410
1        653     BR    Brazil      4200      4100
2        438     EU     Spain       290       200
3        432     NA       USA       390       500
4        567     BR    Brazil       800       820
5        438     NA    Canada       290       300
6        432     SA     Chile       425       430

CodePudding user response:

I believe this will do what your question asks:

df1 = df1.join(df2.set_index(['Product ID', 'Region', 'Country']), on=['Product ID', 'Region', 'Country'])

Input:

df1:
   Product ID Region   Country  Cost Wk1
0         432     SA  Colombia       400
1         653     BR    Brazil      4200
2         438     EU     Spain       290
3         432     NA       USA       390
4         567     BR    Brazil       800
5         438     NA    Canada       290
6         432     SA     Chile       425
df2:
   Product ID Region   Country  Cost Wk2
0         438     NA    Canada       300
1         432     NA       USA       500
2         432     SA  Colombia       410
3         432     SA     Chile       430
4         653     BR    Brazil      4100
5         567     BR    Brazil       820
6         438     EU     Spain       200
result:
   Product ID Region   Country  Cost Wk1  Cost Wk2
0         432     SA  Colombia       400       410
1         653     BR    Brazil      4200      4100
2         438     EU     Spain       290       200
3         432     NA       USA       390       500
4         567     BR    Brazil       800       820
5         438     NA    Canada       290       300
6         432     SA     Chile       425       430

Note that my result differs from the one in your question (4200 and 800 are swapped for Brazil), but I think it's correct.

  • Related