Reallocate the fraction of weights above threshold to the other weights while maintaining the sum pe-CodePudding

I have a dataframe df1 with Date and ID as index and the Weight. I want to set an upper weight limit (30%) of the weights per date. The weights on each day add up to 100% and if I set an upper weight limit, it is the case that the next biggest weight is then bigger than the weight limit of 30%. Is there a way to account for that without doing several iterations? The remaining weight sum which are not bigger than the max weight add up to: 100% - number of max weights reached.

df1:
      Date ID  Weight
2023-01-30  A    0.45  <-- over max weight of 30%
2023-01-30  B    0.25
2023-01-30  C    0.15
2023-01-30  D    0.10
2023-01-30  E    0.05
2023-01-31  A    0.55
2023-01-31  B    0.25
2023-01-31  C    0.20
2023-01-31  D    0.00
2023-01-31  E    0.00

df1:
      Date ID  Weight  Weight_upper
2023-01-30  A    0.45         0.300 <-- set to max weight
2023-01-30  B    0.25         0.318 <-- bigger than max weight
2023-01-30  C    0.15         0.191
2023-01-30  D    0.10         0.127 (ex calculation: 0.1 * (1 - 0.3)/(0.25 0.15 0.1 0.05)
2023-01-30  E    0.05         0.060
2023-01-31  A    0.55         0.300
2023-01-31  B    0.25         0.389
2023-01-31  C    0.20         0.311
2023-01-31  D    0.00         0.000
2023-01-31  E    0.00         0.000

For reproducibility:

df = pd.DataFrame({
    'Date':['2023-01-30', '2023-01-30', '2023-01-30', '2023-01-30', '2023-01-30', '2023-01-31', '2023-01-31', '2023-01-31', '2023-01-31', '2023-01-31'],
    'ID':['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
    'Weight':[0.45, 0.25, 0.15, 0.1, 0.05, 0.55, 0.25, 0.2, 0, 0]})
df.set_index('Date')

Many thanks for your help!

CodePudding user response：

The logic is unclear, so I'll assume you want to allocate the fraction of the weights above the desired max (0.3) to the other weights in a way that doesn't cause any other weight to become above threshold.

I would compute the difference to the threshold, then split the values into above/below and allocate the extra weight proportionally to the available space for each weight below threshold:

max_weight = 0.3

df2 = df.assign(diff=df['Weight'].sub(max_weight),
                mask=lambda d: d['diff'].gt(0),
                above=lambda d: d['diff'].where(d['mask']),
                below=lambda d: d['diff'].mask(d['mask']),
                )
g = df2.groupby('Date')

df['Weight_upper'] = (df2['below']
                      .div(g['below'].transform('sum'))
                      .mul(g['above'].transform('sum'))
                      .add(df['Weight'])
                      .fillna(max_weight)
                     )

print(df)

Output:

         Date ID  Weight  Weight_upper
0  2023-01-30  A    0.45      0.300000
1  2023-01-30  B    0.25      0.261538
2  2023-01-30  C    0.15      0.184615
3  2023-01-30  D    0.10      0.146154
4  2023-01-30  E    0.05      0.107692
5  2023-01-31  A    0.55      0.300000
6  2023-01-31  B    0.25      0.266667
7  2023-01-31  C    0.20      0.233333
8  2023-01-31  D    0.00      0.100000
9  2023-01-31  E    0.00      0.100000

Intermediates:

         Date ID  Weight  diff   mask  above  below  Weight_upper
0  2023-01-30  A    0.45  0.15   True   0.15    NaN      0.300000
1  2023-01-30  B    0.25 -0.05  False    NaN  -0.05      0.261538
2  2023-01-30  C    0.15 -0.15  False    NaN  -0.15      0.184615
3  2023-01-30  D    0.10 -0.20  False    NaN  -0.20      0.146154
4  2023-01-30  E    0.05 -0.25  False    NaN  -0.25      0.107692
5  2023-01-31  A    0.55  0.25   True   0.25    NaN      0.300000
6  2023-01-31  B    0.25 -0.05  False    NaN  -0.05      0.266667
7  2023-01-31  C    0.20 -0.10  False    NaN  -0.10      0.233333
8  2023-01-31  D    0.00 -0.30  False    NaN  -0.30      0.100000
9  2023-01-31  E    0.00 -0.30  False    NaN  -0.30      0.100000