Home > front end >  Is there a fast way to pack multiple digits into a number?
Is there a fast way to pack multiple digits into a number?

Time:03-22

Consider 8 digit characters like 12345678 as a string. It can be converted to a number where every byte contains a digit like this:

const char* const str = "12345678";
const char* const base = "00000000";
const uint64_t unpacked = *reinterpret_cast<const uint64_t*>(str)
    - *reinterpret_cast<const uint64_t*>(base);

Then unpacked will be 0x0807060504030201 on a little-endian system.

What is the fastest way to convert the number into 12345678, perhaps by multiplying it by some magic number or using SIMD up to AVX2?

UPDATE: 12345678 has to be a number stored in a 32-bit or 64-bit integer, not a string.

CodePudding user response:

size_t len = strlen(str);
uint64_t unpacked = 0;
for (size_t i = 0; i < len; i  ) {
    uint8_t val = (uint8_t)(str[i] -'0');
    unpacked = (unpacked << 8) | val;
}

That will convert "12345678" into 0x0102030405060708

If you want the other endianness, replace the val = statement in the loop above with this:

    uint8_t val = (uint8_t)(str[len - i - 1] -'0');

You could also do the loop unrolling yourself and risk some potentially undefined behavior:

uint8_t* ptr = (uint8_t*)(&unpacked);
ptr[0] = str[0] - '0';
ptr[1] = str[1] - '0';
ptr[2] = str[2] - '0';
ptr[3] = str[3] - '0';
ptr[4] = str[4] - '0';
ptr[5] = str[5] - '0';
ptr[6] = str[6] - '0';
ptr[7] = str[7] - '0';

CodePudding user response:

"Fastest" can be defined in two different manners:

  • "fastest to be coded"
  • "fastest to be executed"

Fastest to be coded

#include <string> 

std::string s = std::to_string(42);

See Easiest way to convert int to string in C

Fastest to be executed

Since your number contains digit characters and we know that a character digit can be converted into its corresponding character value with

char result = input   '0';

Now, you can build a char array like this:

char[] result = new char[8](); //8 is the number of digits in this case

for (int digit = 8 - 1; digit >= 0; digit--) {
    result[digit] = (input % 10)   '0';
    input /= 10;
}

Finally, you can convert your char array into a String, like this:

https://www.geeksforgeeks.org/convert-character-array-to-string-in-c/

CodePudding user response:

If you change your input format to breadth-first element order like this:

Sample 9 numbers, interleaved
digit[]:
1 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2 2 ...
... 3 3 3 3 3 3 3 3 3 ....

for(int j=0; j<num_parse; j =9)
{
    for(int i=0; i<9; i  )
    {
        value[i]  = 
          (multiplier[i]*=10)*
          (digit[i j]-'0');
    }
    // value vector copied to output
    // clear value & multiplier vectors
}

And if you convert more than just 9 values, like 512 or 8192 with padding to any multiple of 32, compiler should vectorize it.

To prepare input, you can use 8 different channels, 1 per digit of every parsed value.

  • Related