Home > OS >  How to remove upper and lower bounds with Python
How to remove upper and lower bounds with Python

Time:04-14

I have a dataframe with 2 important columns.One of these important columns is the "Price" column and the other is the "Quantity" column.

My dataframe;

Price Quantity Total Quantity
5 500 4000
6 100 4000
7 400 4000
8 200 4000
9 200 4000
10 800 4000
10 200 4000
10 300 4000
10 300 4000
11 300 4000
12 300 4000
12 100 4000
13 200 4000
14 100 4000

My code;

#The type of 2 columns is set to float and the price column is divided by 100
data_state['Price'].apply(lambda x: float(x))
data_state['Quantity'].apply(lambda x: float(x))
data_state['Price'] = data_state['Price'] /  100 

#price and quantity sorting smallest to largest
data_state = data_state.sort_values(['Price', 'Quantity'], ascending=(True, True))

#Getting the sum of the quantity column
data_state['Total Quantity'] = data_state['Quantity'].sum()

#The total quantity column is multiplied by the value of "0.15" and the part to be subtracted from the total is found.
data_state['Total Quantity Bounds'] = data_state['Total Quantity'] * 0.15

#At this stage, I need to decrease the value that I found from the smallest to the largest, from the top and bottom of the ordered quantity column. I mean; for the quantity which are at the upper and lower bounds, only the part of Quantity which falls in central 70% are included in the calculation.

#The top and the bottom 15% of the "Total Quantity" are detected as outliers and removed from the "Quantity"

İn this dataframe Lower Bounds;

Lower Bounds: 4000 * 0.15 = 600 Quantity

İn this dataframe Upper Bounds;

Upper Bounds: 4000 * 0.15 = 600 Quantity

My expected output;

Price Quantity New Total Quantity
5 0 2800
6 0 2800
7 400 2800
8 200 2800
9 200 2800
10 800 2800
10 200 2800
10 300 2800
10 300 2800
11 300 2800
12 100 2800
12 0 2800
13 0 2800
14 0 2800

As you can see above, Numbers close to the "quantity" number from upper and lower in the quantity column corresponding to the number 600 (4000 * 0.15). In particular, I reduced the previously 300 figure, which corresponds to the price 12, to 100.

Thank you,

CodePudding user response:

You can use cumsum to compute the cumulated sum on the forward and reverse array combined with clip:

# get first value in Total Quantity column and multiply by desired factor
qty = df['Total Quantity'].iat[0]*0.15

# update Total Quantity column
df['Total Quantity'] -= 2*qty

## trim top

# compute the cumulated quantity and identify the value strictly lower than qty
cs = df['Quantity'].cumsum()
m = cs.lt(qty)

# select those rows and the one after (shift)
# remove the qty from the cumulated sum clipping negative values to zero
# and update the dataframe
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)


## trim bottom
# identical to above but on the reversed [::-1] array

cs = df['Quantity'][::-1].cumsum()
m = cs.lt(qty)
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)

output:

    Price  Quantity  Total Quantity
0       5         0          2800.0
1       6         0          2800.0
2       7       400          2800.0
3       8       200          2800.0
4       9       200          2800.0
5      10       800          2800.0
6      10       200          2800.0
7      10       300          2800.0
8      10       300          2800.0
9      11       300          2800.0
10     12       100          2800.0
11     12         0          2800.0
12     13         0          2800.0
13     14         0          2800.0
  • Related