Linear Recursion Vectorization-CodePudding

To vectorize the following mathematical expression (linear recursion):

f(i)=f(i-1)/c g(i), i starts at 1, f(0), c are constant numbers and given.

I can get better speed by using list-comprehension, that is:

def function():
        txt=     [0,2,0,2,0,2,0,2,2,2,0,2,2,2,0,2,0,2,2,2,0,2,0,2,0,2,0,2,2,2,0,2,0,2,0,2,0,2,2,2]  
        indices_0=[]
        vl=0
        sb_l=10
        CONST=512
        [vl := vl pow(txt[i],(i 1)) for i in range(sb_l)]
        if (vl==1876):
                        indices_0=[0]
        p=[i for i in range(1,len(txt)-sb_l 1) if (vl := (vl-txt[i-1])/2  txt[i sb_l-1]*CONST)==1876]
        print(indices_0 p)

function()

I am looking for a vectorized/faster than vectorization (if possible!) implementation of the above code in python/c.

Note:

A linear recursive function is a function that only makes a single call to itself each time the function runs (as opposed to one that would call itself multiple times during its execution). The factorial function is a good example of linear recursion.

Note that all variables array are given for demonstration purpose, the main part for vectorization is :

[vl := vl pow(txt[i],(i 1)) for i in range(sb_l)]
if (vl==1876):
            indices_0=[0]
            p=[i for i in range(1,len(txt)-sb_l 1) 
                 if (vl := (vl-txt[i-1])/2  txt[i sb_l-1]*CONST)==1876]

Here, f(i-1)= (vl-txt[i-1]), c=2, g(i)= txt[i sb_l-1]*CONST.

POSTSCRIPT: I am currently doing it in python, would it be much faster if it is implemented in C language's vectorization?

CodePudding user response：

Here is an example of equivalent C program doing the same thing but faster:

#include <stdio.h>
#include <stdlib.h>

// See: https://stackoverflow.com/questions/29787310/does-pow-work-for-int-data-type-in-c
int64_t int_pow(int64_t base, int exp)
{
    int64_t result = 1;
    while (exp)
    {
        // Branchless optimization: result *= base * (exp % 2);
        if(exp % 2)
            result *= base;
        exp /= 2;
        base *= base;
    }
    return result;
}

void function()
{
    // Both txt values and sb_l not be too big or it will cause silent overflows (ie. wrong results)
    const int64_t txt[] = {0,2,0,2,0,2,0,2,2,2,0,2,2,2,0,2,0,2,2,2,0,2,0,2,0,2,0,2,2,2,0,2,0,2,0,2,0,2,2,2};
    const size_t txtSize = sizeof(txt) / sizeof(txt[0]);
    const int sb_l = 10;
    const int64_t CONST = 512;
    int64_t vl = 0;
    int64_t* results = (int64_t*)malloc(txtSize * sizeof(int64_t));
    size_t cur = 0;

    // Optimization so not to compute pow(0,i 1) which is 0
    for (int i = 0; i < sb_l;   i)
        if(txt[i] != 0)
            vl  = int_pow(txt[i], i 1);

    if (vl == 1876)
    {
        results[cur] = 0;
        cur  ;
    }

    for (int i = 1; i < txtSize-sb_l 1;   i)
    {
        vl = (vl - txt[i-1]) / 2   txt[i sb_l-1] * CONST;

        if(vl == 1876)
            results[cur  ] = i;
    }

    // Printing
    printf("[");
    for (int i = 0; i < cur;   i)
    {
        if(i > 0)
            printf(", ");
        printf("%ld", results[i]);
    }
    printf("]\n");
    fflush(stdout);

    free(results);
}


int main(int argc, char* argv[])
{
    function();
    return 0;
}

Be careful with overflows. You can put assertions if you are unsure about that in specific places (note they make the code slower when enabled though). Please do not forget to compile the program with optimizations (eg. -O3 with GCC and Clang and /O2 with MSVC).