Home > database >  Relative weight within row with max value in python DataFrame
Relative weight within row with max value in python DataFrame

Time:12-06

I try to calculate the relative weights of df1 in each row with a max value of 0.5. So far, I was able to calculate the relative weights in df2 but without an upper boundary. Here would be a simple example:

import pandas as pd
df1 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    'ID1':[0,0,2,1,1], 
    'ID2':[1,3,1,1,2],
    'ID3':[1,0,0,1,0],
    'ID4':[1,1,7,1,0],
    'ID5':[0,6,0,0,1]})

df1:
    Dates       ID1 ID2 ID3 ID4 ID5
0   2021-01-01  0   1   1   1   0
1   2021-01-02  0   3   0   1   6
2   2021-01-03  2   1   0   7   0
3   2021-01-04  1   1   1   1   0
4   2021-01-05  1   2   0   0   1
df1 = df1.set_index('Dates').T
df2 = df1.transform(lambda x: x/sum(x)).T
df2.round(2)

df2:            
Dates       ID1     ID2     ID3     ID4     ID5         
2021-01-01  0.00    0.33    0.33    0.33    0.00
2021-01-02  0.00    0.30    0.00    0.10    0.60
2021-01-03  0.20    0.10    0.00    0.70    0.00
2021-01-04  0.25    0.25    0.25    0.25    0.00
2021-01-05  0.25    0.50    0.00    0.00    0.25

I try to get df3 with a relative weight maximum of 0.5.

df3:            
Dates       ID1     ID2     ID3     ID4     ID5         
2021-01-01  0.00    0.33    0.33    0.33    0.00
2021-01-02  0.00    0.30    0.00    0.10    0.50
2021-01-03  0.20    0.10    0.00    0.50    0.00
2021-01-04  0.25    0.25    0.25    0.25    0.00
2021-01-05  0.25    0.50    0.00    0.00    0.25

When I use the following adjusted function, I get the error: Transform function failed

df1.transform(lambda x: x/sum(x) if x/sum(x) < 0.5 else 0.5).T

Thanks a lot!

CodePudding user response:

You can use apply(...,axis=1) and clip the values with a max of 0.5 (this assumes Date is always the first columns - alternatively, we could set it as an index):

df1[df1.columns[1:]] = df1[df1.columns[1:]].apply(lambda x:x/sum(x), axis=1).clip(upper=0.5)

CodePudding user response:

for col in df1.columns:
   df1[col] = df1[col].apply(lambda x: x/sum(df1[col]) if x/sum(df1[col]) < 0.5 else 0.5)

Have fun!

  • Related