I want to multiply column values by a specific scalar based on the name of the column:
- if column name = "Math", then all the values in 'Math" column should be multiply by 5;
- if column name = "Physique", values in that column should be multiply by 4;
- if column name = "Bio", values in that column should be multiplied by 3;
- all the remaining columns should be multiplied by 2
What I have:
This is what I should have :
listm = ['Math', 'Physique', 'Bio']
def note_coef(row):
for m in listm:
if 'Math' in listm:
result = df['Math']*5
return result
df2=df.apply(note_coef)
df2
Note I stopped with only 1 if to test my code but the outcome is not what I expected. I am quite new in programming and here as well.
CodePudding user response:
Fake data
n=5
d = {'a':np.ones(n),
'b':np.ones(n),
'c':np.ones(n),
'd':np.ones(n)}
df = pd.DataFrame(d)
print(df)
Select the columns and multiply by a tuple.
df[['a','c']] = df[['a','c']] * (2,4)
print(df)
a b c d
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0
a b c d
0 2.0 1.0 4.0 1.0
1 2.0 1.0 4.0 1.0
2 2.0 1.0 4.0 1.0
3 2.0 1.0 4.0 1.0
4 2.0 1.0 4.0 1.0
CodePudding user response:
You can use df['col_name'].multiply(value)
to apply on a whole column. The remaining columns can be modified in a loop of all columns except listm
.
listm = ['Math', 'Physique', 'Bio']
for i, head in enumerate(listm):
df[head] = df[head].multiply(5-i)
heads = df.head()
for head in heads:
if not head in listm:
df[head] = df[head].multiply(2)
CodePudding user response:
here is another way to do it using matrix multiplication
For sleuths matching the output, the data was not provided as a text, so created the test data in a patter of the screen shot
mul = [5,4,3,2,2,2,2,1] # multipliers
df1=df.iloc[:,1:].mul(mul)
df1.total = df1.iloc[:,:7].sum(axis=1)
df.update(df1, join='left', overwrite=True)
df
source Math Physics Bio Algo Archi Sport eng total
0 A 50.0 60.0 60.0 50.0 60.0 70.0 80.0 430.0
1 B 55.0 64.0 63.0 52.0 62.0 72.0 82.0 450.0
2 C 5.5 8.4 9.3 NaN NaN NaN NaN 23.2
3 D NaN NaN NaN 22.0 42.0 62.0 82.0 208.0
4 E 6.0 8.8 9.6 NaN NaN NaN NaN 24.4
5 F NaN NaN NaN 24.0 44.0 64.0 84.0 216.0
TEST DATA
data_out = [
['A', 10,15,20,25,30,35,40],
['B', 11,16,21,26,31,36,41],
['C', 1.1,2.1,3.1],
['D', np.NaN,np.NaN,np.NaN,11,21,31,41],
['E', 1.2,2.2,3.2],
['F', np.NaN,np.NaN,np.NaN,12,22,32,42],
]
df=pd.DataFrame(data_out, columns=[ 'source', 'Math', 'Physics', 'Bio', 'Algo', 'Archi', 'Sport', 'eng'])
df['total'] = df.iloc[:,1:].sum(axis=1)
source Math Physics Bio Algo Archi Sport eng total
0 A 10.0 15.0 20.0 25.0 30.0 35.0 40.0 175.0
1 B 11.0 16.0 21.0 26.0 31.0 36.0 41.0 182.0
2 C 1.1 2.1 3.1 NaN NaN NaN NaN 6.3
3 D NaN NaN NaN 11.0 21.0 31.0 41.0 104.0
4 E 1.2 2.2 3.2 NaN NaN NaN NaN 6.6
5 F NaN NaN NaN 12.0 22.0 32.0 42.0 108.0