I am trying to find a way to bring the new calculated columns back to the dataframe after doing the multiplication. However, I want them to replace the value under original columns like '2018, 2019, 2020'. Is there a way to do it alongside with the multiplication, especially if I have a long list of columns to be multiplied?
import pandas as pd
df1 = pd.DataFrame({
'ID': ['a1', 'b1', 'c1'],
'2018': [1, 5, 9],
'2019': [2, 6, 10],
'2020': [3, 7, 11]})
df2 = pd.DataFrame({
'ID': ['a1', 'b1'],
'percentage': [0.6, 0.4]})
df1.filter(regex='2018|2019|2020').multiply(df2["percentage"], axis="index")
Expected:
ID 2018 2019 2020
0 a1 0.6 1.2 1.8
1 b1 2.0 2.4 2.8
2 c1 NaN NaN NaN
CodePudding user response:
You can align indices by converting both ID
columns to indexes and then processing all columns:
df = df1.set_index('ID').multiply(df2.set_index('ID')["percentage"], axis="index")
print(df)
2018 2019 2020
ID
a1 0.6 1.2 1.8
b1 2.0 2.4 2.8
c1 NaN NaN NaN
df2 = pd.DataFrame({
'ID': ['a1', 'c1'],
'percentage': [0.6, 0.4]})
df = df1.set_index('ID').multiply(df2.set_index('ID')["percentage"], axis="index")
print(df)
2018 2019 2020
ID
a1 0.6 1.2 1.8
b1 NaN NaN NaN
c1 3.6 4.0 4.4
If only needing to multiply some columns:
cols = ['2018','2019']
df1 = df1.set_index('ID')
df1[cols] = df1[cols].multiply(df2.set_index('ID')["percentage"], axis="index")
print(df1)
2018 2019 2020
ID
a1 0.6 1.2 3
b1 NaN NaN 7
c1 3.6 4.0 11
HI why are you setting index in the last part of the answer of yours
Because not setting the index before multiplying produces incorrect output:
df2 = pd.DataFrame({
'ID': ['a1', 'c1'],
'percentage': [0.6, 0.4]})
cols = ['2018', '2019', '2020']
df1[cols] = df1[cols].mul(df2["percentage"], axis=0)
print (df1)
ID 2018 2019 2020
0 a1 0.6 1.2 1.8
1 b1 2.0 2.4 2.8 <- incorrect result (aligned on index 1 not b1)
2 c1 NaN NaN NaN <- incorrect result (aligned on index 2 not c1)