I have a dataframe with 2 important columns.One of these important columns is the "Price" column and the other is the "Quantity" column.
My dataframe;
Price | Quantity | Total Quantity |
---|---|---|
5 | 500 | 4000 |
6 | 100 | 4000 |
7 | 400 | 4000 |
8 | 200 | 4000 |
9 | 200 | 4000 |
10 | 800 | 4000 |
10 | 200 | 4000 |
10 | 300 | 4000 |
10 | 300 | 4000 |
11 | 300 | 4000 |
12 | 300 | 4000 |
12 | 100 | 4000 |
13 | 200 | 4000 |
14 | 100 | 4000 |
My code;
#The type of 2 columns is set to float and the price column is divided by 100
data_state['Price'].apply(lambda x: float(x))
data_state['Quantity'].apply(lambda x: float(x))
data_state['Price'] = data_state['Price'] / 100
#price and quantity sorting smallest to largest
data_state = data_state.sort_values(['Price', 'Quantity'], ascending=(True, True))
#Getting the sum of the quantity column
data_state['Total Quantity'] = data_state['Quantity'].sum()
#The total quantity column is multiplied by the value of "0.15" and the part to be subtracted from the total is found.
data_state['Total Quantity Bounds'] = data_state['Total Quantity'] * 0.15
#At this stage, I need to decrease the value that I found from the smallest to the largest, from the top and bottom of the ordered quantity column. I mean; for the quantity which are at the upper and lower bounds, only the part of Quantity which falls in central 70% are included in the calculation.
#The top and the bottom 15% of the "Total Quantity" are detected as outliers and removed from the "Quantity"
İn this dataframe Lower Bounds;
Lower Bounds: 4000 * 0.15 = 600 Quantity
İn this dataframe Upper Bounds;
Upper Bounds: 4000 * 0.15 = 600 Quantity
My expected output;
Price | Quantity | New Total Quantity |
---|---|---|
5 | 0 | 2800 |
6 | 0 | 2800 |
7 | 400 | 2800 |
8 | 200 | 2800 |
9 | 200 | 2800 |
10 | 800 | 2800 |
10 | 200 | 2800 |
10 | 300 | 2800 |
10 | 300 | 2800 |
11 | 300 | 2800 |
12 | 100 | 2800 |
12 | 0 | 2800 |
13 | 0 | 2800 |
14 | 0 | 2800 |
As you can see above, Numbers close to the "quantity" number from upper and lower in the quantity column corresponding to the number 600 (4000 * 0.15). In particular, I reduced the previously 300 figure, which corresponds to the price 12, to 100.
Thank you,
CodePudding user response:
You can use cumsum
to compute the cumulated sum on the forward and reverse array combined with clip
:
# get first value in Total Quantity column and multiply by desired factor
qty = df['Total Quantity'].iat[0]*0.15
# update Total Quantity column
df['Total Quantity'] -= 2*qty
## trim top
# compute the cumulated quantity and identify the value strictly lower than qty
cs = df['Quantity'].cumsum()
m = cs.lt(qty)
# select those rows and the one after (shift)
# remove the qty from the cumulated sum clipping negative values to zero
# and update the dataframe
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)
## trim bottom
# identical to above but on the reversed [::-1] array
cs = df['Quantity'][::-1].cumsum()
m = cs.lt(qty)
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)
output:
Price Quantity Total Quantity
0 5 0 2800.0
1 6 0 2800.0
2 7 400 2800.0
3 8 200 2800.0
4 9 200 2800.0
5 10 800 2800.0
6 10 200 2800.0
7 10 300 2800.0
8 10 300 2800.0
9 11 300 2800.0
10 12 100 2800.0
11 12 0 2800.0
12 13 0 2800.0
13 14 0 2800.0