Optimalization of my script whcich calculate weekly qty of product-CodePudding

I have a task where I need to change multiple times data in my data frame. I wrote the answer in Jupyter notebook, using loops and it's take around 2,5min to run.

However, when I rewrite my code to pycharm using modules and definitions it takes around 20min and I do not know where I made a mistake.

Here is explanation of my task and my idea which was written in Jupyter, maybe you will have some ideas how I could write it better.

I have a data frame with weekly qty of sold toys in factory when 0w last week.

ID  0w  1w  2w  3w  4w  5w  6w  7w  8w  9w  10w 11w 12w 13w 
0   0   1   0   0   5   1   65  2   62  1   1   2   1   60
1   0   0   1   5   16  0   2   0   0   40  0   100 0   0
2   0   3   0   0   0   0   0   40  0   0   20  0   0   0
3   0   5   6   0   0   0   0   0   0   0   0   0   0   0
4   0   1   0   0   0   0   0   0   0   0   0   0   0   0

First step is to save every row from my df to a list of lists 'week_qty':

week_qty = []
lenOfRows = len(copiedData)
for i in range(0, lenOfRows):
    week_qty.append(weeksQtyEXTdata.iloc[i])

week_qty[0] = [0    1   0   0   5   1   65  2   62  1   1   2   1   60]

Second step is to take 90% and 10% value of each row and compare with it each value of the list, so for the first row 90% = 61.4 and 10% = 0. If the value in cell is lower than p10 I change it to the value of p10 and if it' higher than p90 I change it with the value of p90.

def CalcPercenatage(week_qty,oneWeek):
    p10=np.percentile(weekDemand,10)
    p90=np.percentile(weekDemand,90)
    if (oneWeek < p10):
        return p10
    elif(oneWeek > p90):
        return p90
    else:
        return oneWeek

CalcPercenatage(week_qty[0]) = [60, 1, 2, 1, 1, 61.4, 2, 61.4, 1, 5, 0, 0, 1, 0]

Last step is to create a matrix of those values and do it for every row for each of 14 cells in a row:

for i in range(0, lenOfRows):
    Rows = []
    for j in range(0, 14):
        Rows.append(CalcPercenatage(week_qty[i], week_qty[i][j]))
    MatrixBetweenWeeks.append(Rows)

I would like to make it faster, for 31000 data in pycharm it is working too long.

CodePudding user response：

You can use clip:

p10, p90 = np.percentile(df.iloc[:, 1:], [10, 90], axis=1)
out = df.iloc[:, 1:].clip(p10, p90, axis=0)
out['Average'] = out.mean(axis=1)
out = pd.concat([df.iloc[:, :1], out], axis=1)

Output:

>>> out
   ID  0w   1w   2w  3w  4w  5w    6w    7w    8w    9w   10w   11w  12w  13w    Average
0   0   0  1.0  0.0   0   5   1  61.4   2.0  61.4   1.0   1.0   2.0    1   60  14.057143
1   1   0  0.0  1.0   5  16   0   2.0   0.0   0.0  32.8   0.0  32.8    0    0   6.400000
2   2   0  3.0  0.0   0   0   0   0.0  14.9   0.0   0.0  14.9   0.0    0    0   2.342857
3   3   0  3.5  3.5   0   0   0   0.0   0.0   0.0   0.0   0.0   0.0    0    0   0.500000
4   4   0  0.0  0.0   0   0   0   0.0   0.0   0.0   0.0   0.0   0.0    0    0   0.000000

Performance

For 31K records:

%timeit myfunc(df)
15.3 ms ± 80 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)