I have the following dataframe:
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df:
a b
0 1 3
1 2 4
I have a sample size N=5. I want to normalize the weights in the dataframe using
df.div(df.sum(axis=1), axis=0)
and enforce a constraint, such that none of the weights are greater than 1/sqrt(N).
Can this be done in one line?
CodePudding user response:
To normalize and ensure that no value is greater than a reference, you need to get the max of the normalized values and normalize again:
import numpy as np
N = 5 # 1/np.sqrt(N) = 0.447214
df2 = df.div(df.sum(axis=1), axis=0)
df2 = df2.div(df2.values.max()*np.sqrt(N))
Output:
a b
0 0.149071 0.447214
1 0.198762 0.397523
This is two steps, two lines as the second step depends on the first one.
Can you do it in one line? Yes, but should you?
By performing the same computation twice: inefficient
N = 5
df2 = df.div(df.sum(axis=1), axis=0).div(df.div(df.sum(axis=1), axis=0).values.max()*np.sqrt(N))
By using an assignment expression: not as readable
N = 5
df2 = (df2:=df.div(df.sum(axis=1), axis=0)).div(df2.values.max()*np.sqrt(N))
I would stick with the two lines