How to remove upper and lower bounds with Python-CodePudding

I have a dataframe with 2 important columns.One of these important columns is the "Price" column and the other is the "Quantity" column.

My dataframe;

Price	Quantity	Total Quantity
5	500	4000
6	100	4000
7	400	4000
8	200	4000
9	200	4000
10	800	4000
10	200	4000
10	300	4000
10	300	4000
11	300	4000
12	300	4000
12	100	4000
13	200	4000
14	100	4000

My code;

#The type of 2 columns is set to float and the price column is divided by 100
data_state['Price'].apply(lambda x: float(x))
data_state['Quantity'].apply(lambda x: float(x))
data_state['Price'] = data_state['Price'] /  100 

#price and quantity sorting smallest to largest
data_state = data_state.sort_values(['Price', 'Quantity'], ascending=(True, True))

#Getting the sum of the quantity column
data_state['Total Quantity'] = data_state['Quantity'].sum()

#The total quantity column is multiplied by the value of "0.15" and the part to be subtracted from the total is found.
data_state['Total Quantity Bounds'] = data_state['Total Quantity'] * 0.15

#At this stage, I need to decrease the value that I found from the smallest to the largest, from the top and bottom of the ordered quantity column. I mean; for the quantity which are at the upper and lower bounds, only the part of Quantity which falls in central 70% are included in the calculation.

#The top and the bottom 15% of the "Total Quantity" are detected as outliers and removed from the "Quantity"

İn this dataframe Lower Bounds;

Lower Bounds: 4000 * 0.15 = 600 Quantity

İn this dataframe Upper Bounds;

Upper Bounds: 4000 * 0.15 = 600 Quantity

My expected output;

Price	Quantity	New Total Quantity
5	0	2800
6	0	2800
7	400	2800
8	200	2800
9	200	2800
10	800	2800
10	200	2800
10	300	2800
10	300	2800
11	300	2800
12	100	2800
12	0	2800
13	0	2800
14	0	2800

As you can see above, Numbers close to the "quantity" number from upper and lower in the quantity column corresponding to the number 600 (4000 * 0.15). In particular, I reduced the previously 300 figure, which corresponds to the price 12, to 100.

Thank you,

CodePudding user response：

You can use cumsum to compute the cumulated sum on the forward and reverse array combined with clip:

# get first value in Total Quantity column and multiply by desired factor
qty = df['Total Quantity'].iat[0]*0.15

# update Total Quantity column
df['Total Quantity'] -= 2*qty

## trim top

# compute the cumulated quantity and identify the value strictly lower than qty
cs = df['Quantity'].cumsum()
m = cs.lt(qty)

# select those rows and the one after (shift)
# remove the qty from the cumulated sum clipping negative values to zero
# and update the dataframe
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)


## trim bottom
# identical to above but on the reversed [::-1] array

cs = df['Quantity'][::-1].cumsum()
m = cs.lt(qty)
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)

output:

    Price  Quantity  Total Quantity
0       5         0          2800.0
1       6         0          2800.0
2       7       400          2800.0
3       8       200          2800.0
4       9       200          2800.0
5      10       800          2800.0
6      10       200          2800.0
7      10       300          2800.0
8      10       300          2800.0
9      11       300          2800.0
10     12       100          2800.0
11     12         0          2800.0
12     13         0          2800.0
13     14         0          2800.0