Home > Enterprise >  Multiply columns values by a scalar based on conditions DataFrame
Multiply columns values by a scalar based on conditions DataFrame

Time:07-01

I want to multiply column values by a specific scalar based on the name of the column:

  • if column name = "Math", then all the values in 'Math" column should be multiply by 5;
  • if column name = "Physique", values in that column should be multiply by 4;
  • if column name = "Bio", values in that column should be multiplied by 3;
  • all the remaining columns should be multiplied by 2

What I have:

enter image description here

This is what I should have :

enter image description here

listm = ['Math', 'Physique', 'Bio']
def note_coef(row):
    for m in listm:
        if 'Math' in listm:
             result = df['Math']*5
    return result

df2=df.apply(note_coef)
df2

Note I stopped with only 1 if to test my code but the outcome is not what I expected. I am quite new in programming and here as well.

CodePudding user response:

Fake data

n=5
d = {'a':np.ones(n),
     'b':np.ones(n),
     'c':np.ones(n),
     'd':np.ones(n)}
df = pd.DataFrame(d)
print(df)

Select the columns and multiply by a tuple.

df[['a','c']] = df[['a','c']] * (2,4)
print(df)

     a    b    c    d
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0
4  1.0  1.0  1.0  1.0

     a    b    c    d
0  2.0  1.0  4.0  1.0
1  2.0  1.0  4.0  1.0
2  2.0  1.0  4.0  1.0
3  2.0  1.0  4.0  1.0
4  2.0  1.0  4.0  1.0

CodePudding user response:

You can use df['col_name'].multiply(value) to apply on a whole column. The remaining columns can be modified in a loop of all columns except listm.

listm = ['Math', 'Physique', 'Bio']
for i, head in enumerate(listm):
    df[head] = df[head].multiply(5-i)

heads = df.head()
for head in heads:
    if not head in listm:
        df[head] = df[head].multiply(2)

CodePudding user response:

here is another way to do it using matrix multiplication

For sleuths matching the output, the data was not provided as a text, so created the test data in a patter of the screen shot

mul = [5,4,3,2,2,2,2,1]  # multipliers

df1=df.iloc[:,1:].mul(mul)
df1.total = df1.iloc[:,:7].sum(axis=1)
df.update(df1, join='left', overwrite=True)
df
    source  Math    Physics     Bio     Algo    Archi   Sport   eng     total
0     A     50.0    60.0       60.0     50.0    60.0    70.0    80.0    430.0
1     B     55.0    64.0       63.0     52.0    62.0    72.0    82.0    450.0
2     C     5.5     8.4         9.3     NaN     NaN     NaN     NaN     23.2
3     D     NaN     NaN         NaN     22.0    42.0    62.0    82.0    208.0
4     E     6.0     8.8         9.6     NaN     NaN     NaN     NaN     24.4
5     F     NaN     NaN         NaN     24.0    44.0    64.0    84.0    216.0

TEST DATA

data_out = [
    ['A', 10,15,20,25,30,35,40],
    ['B', 11,16,21,26,31,36,41],
    ['C', 1.1,2.1,3.1],
    ['D', np.NaN,np.NaN,np.NaN,11,21,31,41],
    ['E', 1.2,2.2,3.2],
    ['F', np.NaN,np.NaN,np.NaN,12,22,32,42],
]
df=pd.DataFrame(data_out, columns=[ 'source', 'Math', 'Physics', 'Bio', 'Algo', 'Archi', 'Sport', 'eng'])
df['total'] = df.iloc[:,1:].sum(axis=1)

    source  Math    Physics     Bio     Algo    Archi   Sport   eng     total
0     A     10.0    15.0       20.0     25.0    30.0    35.0    40.0    175.0
1     B     11.0    16.0       21.0     26.0    31.0    36.0    41.0    182.0
2     C     1.1     2.1         3.1     NaN     NaN     NaN     NaN     6.3
3     D     NaN     NaN         NaN     11.0    21.0    31.0    41.0    104.0
4     E     1.2     2.2         3.2     NaN     NaN     NaN     NaN     6.6
5     F     NaN     NaN         NaN     12.0    22.0    32.0    42.0    108.0
  • Related