Home > Back-end >  Column multiplication to replace original columns within panda dataframe
Column multiplication to replace original columns within panda dataframe

Time:02-12

I am trying to find a way to bring the new calculated columns back to the dataframe after doing the multiplication. However, I want them to replace the value under original columns like '2018, 2019, 2020'. Is there a way to do it alongside with the multiplication, especially if I have a long list of columns to be multiplied?

import pandas as pd

df1 = pd.DataFrame({
    'ID':  ['a1', 'b1', 'c1'],
    '2018': [1, 5, 9], 
    '2019': [2, 6, 10], 
    '2020': [3, 7, 11]})

df2 = pd.DataFrame({
    'ID':  ['a1', 'b1'],
    'percentage': [0.6, 0.4]})

df1.filter(regex='2018|2019|2020').multiply(df2["percentage"], axis="index")


Expected:
   ID 2018  2019  2020
0  a1  0.6   1.2   1.8
1  b1  2.0   2.4   2.8
2  c1  NaN   NaN   NaN

CodePudding user response:

You can align indices by converting both ID columns to indexes and then processing all columns:

df = df1.set_index('ID').multiply(df2.set_index('ID')["percentage"], axis="index")

print(df)
    2018  2019  2020
ID                  
a1   0.6   1.2   1.8
b1   2.0   2.4   2.8
c1   NaN   NaN   NaN

df2 = pd.DataFrame({
    'ID':  ['a1', 'c1'],
    'percentage': [0.6, 0.4]})

df = df1.set_index('ID').multiply(df2.set_index('ID')["percentage"], axis="index")

print(df)
    2018  2019  2020
ID                  
a1   0.6   1.2   1.8
b1   NaN   NaN   NaN
c1   3.6   4.0   4.4

If only needing to multiply some columns:

cols = ['2018','2019']
df1 = df1.set_index('ID')
df1[cols] = df1[cols].multiply(df2.set_index('ID')["percentage"], axis="index")

print(df1)

    2018  2019  2020
ID                  
a1   0.6   1.2     3
b1   NaN   NaN     7
c1   3.6   4.0    11

HI why are you setting index in the last part of the answer of yours

Because not setting the index before multiplying produces incorrect output:

df2 = pd.DataFrame({
    'ID':  ['a1', 'c1'],
    'percentage': [0.6, 0.4]})

cols = ['2018', '2019', '2020']

df1[cols] = df1[cols].mul(df2["percentage"], axis=0)
print (df1)
   ID  2018  2019  2020
0  a1   0.6   1.2   1.8
1  b1   2.0   2.4   2.8 <- incorrect result (aligned on index 1 not b1)
2  c1   NaN   NaN   NaN <- incorrect result (aligned on index 2 not c1)
  • Related