How to efficiently and correctly implement numba jit decorator or apply vectorization instead of a f-CodePudding

Atttempted to implement jit decorator to increase the speed of execution of my code. Not getting proper results. It is throughing all sorts of errors.. Key error, type errors, etc.. The actual code without numba is working without any issues.

# The Code without numba is:
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]

df['A'] = 0.0
df['B'] = 0.0
df['Counter'] = 0
df['Counter'][0] = df['Serial'][0]

for i in range(0,len(df)-1):
    # Filling Column 'A'
    if (df.iloc[1 i,2] > df.iloc[1 i,1]) & (df.iloc[i,5] > df.iloc[1 i,1]) & (df.iloc[1 i,3] >0):
        df.iloc[1 i,4] = round((df.iloc[1 i,1]*1.02),2)
    elif (df.iloc[1 i,2] < df.iloc[1 i,1]) & (df.iloc[i,5] < df.iloc[1 i,1]) & (df.iloc[1 i,3] <0):
        df.iloc[1 i,4] = round((df.iloc[1 i,1]*0.98),2)
    else:
        df.iloc[1 i,4] = df.iloc[i,4]
    # Filling Column 'B'
    df.iloc[1 i,5] = round(((df.iloc[1 i,1]   df.iloc[1 i,2])/2),2) 
    # Filling Column 'Counter'
    if (df.iloc[1 i,5] > df.iloc[1 i,1]):
        df.iloc[1 i,6] = df.iloc[1 i,0]
    else:
        df.iloc[1 i,6] = df.iloc[i,6]
df

The below code is giving me the error. where i tried to implement numba jit decorator to speed up the original python code.

#The code with numba jit which is throwing error is:
df = pd.DataFrame()
df['Serial']=[865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value']=[586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref']=[586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
from numba import jit
@jit(nopython=True)
def Calcs(Serial,Value,Ref,Base):
    n = Base.size
    A = np.empty(n, dtype='f8')
    B = np.empty(n, dtype='f8')
    Counter = np.empty(n, dtype='f8')
    A[0] = 0.0
    B[0] = 0.0
    Counter[0] = Serial[0]
    for i in range(0,n-1):
        # Filling Column 'A'
        if (Ref[i 1] > Value[i 1]) & (B[i] > Value[i 1]) & (Base[i 1] > 0):
            A[i 1] = round((Value[i 1]*1.02),2)
        elif (Ref[i 1] < Value[i 1]) & (B[i] < Value[i 1]) & (Base[i 1] < 0):  
            A[i 1] = round((Value[i 1]*0.98),2)
        else:
            A[i 1] = A[i]
        # Filling Column 'B'
        B[i 1] = round(((Value[i 1]   Ref[i 1])/2),2)
        # Filling Column 'Counter'
        if (B[i 1] > Value[i 1]):
            Counter[i 1] = Serial[i 1]
        else:
            Counter[i 1] = Counter[i]   
    List = [A,B,Counter]        
    return List

Serial = df['Serial'].values.astype(np.float64)
Value = df['Value'].values.astype(np.float64)
Ref = df['Ref'].values.astype(np.float64)
Base = df['Base'].values.astype(np.float64)

VCal = Calcs(Serial,Value,Ref,Base)

df['A'].values[:] = VCal[0].astype(object)
df['B'].values[:] = VCal[1].astype(object)
df['Counter'].values[:] = VCal[2].astype(object)
df

I tried to modify the code as per the guidance provided by @Jérôme Richard for the question

CodePudding user response：

You can only use df['A'].values[:] if the column A exists in the dataframe. Otherwise you need to create a new one, possibly with df['A'] = ....

Moreover, the trick with astype(object) applies for string but not for numbers. Indeed, string-based dataframe columns do apparently not use Numpy string-based array but Numpy object-based arrays containing CPython strings. For numbers, Pandas properly uses number-based arrays. Converting numbers back to object is inefficient. The same applies for astype(np.float64): it is not needed if the time is already fine. This is the case here. If you are unsure about the input type, you can let them though as they are not very expensive.

The Numba function itself is fine (at least with a recent version of Numba). Note that you can specify the signature to compile the function eagerly. This feature also help you to catch typing errors sooner and make them a bit more clear. The downside is that it makes the function less generic as only specific types are supported (though you can specify multiple signatures).

from numba import njit

@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])')
def Calcs(Serial,Value,Ref,Base):
    [...]

Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values

VCal = Calcs(Serial, Value, Ref, Base)

df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]

Note that you can use the Numba decorator flag fastmath=True to speed up the computation if you are sure that the input array never contains spacial values like NaN or Inf or -0, and you do not rely on FP-math associativity.

CodePudding user response：

This Code is giving list to list convertion typeerror.

from numba import njit
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]

@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])',fastmath=True)
def Calcs(Serial,Value,Ref,Base):
    n = Base.size
    A = np.empty(n)
    B = np.empty(n)
    Counter = np.empty(n)
    A[0] = 0.0
    B[0] = 0.0
    Counter[0] = Serial[0]
    for i in range(0,n-1):
    # Filling Column 'A'
        if (Ref[i 1] > Value[i 1]) & (B[i] > Value[i 1]) & (Base[i 1] > 0):
            A[i 1] = round((Value[i 1]*1.02),2)
        elif (Ref[i 1] < Value[i 1]) & (B[i] < Value[i 1]) & (Base[i 1] < 0):  
            A[i 1] = round((Value[i 1]*0.98),2)
        else:
            A[i 1] = A[i]
    # Filling Column 'B'
        B[i 1] = round(((Value[i 1]   Ref[i 1])/2),2)
    # Filling Column 'Counter'
        if (B[i 1] > Value[i 1]):
            Counter[i 1] = Serial[i 1]
        else:
           Counter[i 1] = Counter[i]   
    List = [A,B,Counter]        
    return List

Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values
VCal = Calcs(Serial, Value, Ref, Base)
df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]
df

Getting the below error.

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No conversion from list(array(float64, 1d, C))<iv=None> to 
list(array(float64, 1d, A))<iv=None> for '$408return_value.5', defined at None

File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
    <source elided>
    List = [A,B,Counter]        
    return List
    ^

During: typing of assignment at <ipython-input-9-3c9e0fe02b75> (33)

File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
    <source elided>
    List = [A,B,Counter]        
    return List
    ^