Home > database >  How do I improve the python function for boxcox transformation?
How do I improve the python function for boxcox transformation?

Time:12-28

I created a function giving a fair evaluation of lambda coefficient for a given series/list of data, however it takes lot of time when the input has a long size, is there some tips to speed it up ?

This is my code:

from scipy.stats import norm, pearsonr

def get_lambda_coef(series):
    x=[series[i] for i in range(len(series))]
    for i in range(len(x)-1):
        for j in range(len(x)-1):
            if x[j]>=x[j 1]:
                z=x[j]
                x[j]=x[j 1]
                x[j 1]=z
    i=[j for j in range(1,len(x) 1)]
    f=[(i[j]-0.375)/(len(x) 0.25) for j in range(len(x))]
    u=[norm.ppf(f[i]) for i in range(len(x))]
        
    lambda_coef=0
    width=3
    step=width/6
    k=lambda_coef-width
    iteration=1
    while iteration<=15:
        r_vector=[]
        lambda_vect=[]
        while k<=lambda_coef width:
            if k==0:
                y=[np.log(i) for i in x]
            else:
                y=[(i**k-1)/k for i in x]
            r_vector.append(pearsonr(y, u)[0])
            k =step
        k=lambda_coef-width
        while k<=lambda_coef width:
            lambda_vect.append(k)
            k =step
        lambda_coef=lambda_vect[r_vector.index(max(r_vector))]
        width/=2
        step/=3
        k=lambda_coef-width
        iteration =1
    normalized = [(x**lambda_coef - 1)/lambda_coef for x in series]
    return (normalized, lambda_coef)

Any help from your side will be highly appreciated (I upvote all answers).

Thank you !

CodePudding user response:

What I can see that you are using nested loops. The Time complexity of the below part is

O(n**2)

instead you can sort it

You can replace this code with sorted() function:

x=[series[i] for i in range(len(series))]
for i in range(len(x)-1):
        for j in range(len(x)-1):
            if x[j]>=x[j 1]:
                z=x[j]
                x[j]=x[j 1]
                x[j 1]=z

The time complexity for sorted is O(NlogN)

x=sorted(series)
  • Related