I am implementing an encryption algorithm and was wondering if there was a more efficient way than O(n) for xoring two unsigned char arrays? I essentially want to know if it's possible to avoid doing something like this:
unsigned char a1[64];
unsigned char a2[64];
unsigned char result[64];
for (int i=0;i<64;i )
{
result[i] = a1[i] ^ a2[i];
}
Like I said, I know there's nothing inherently wrong with doing this but I was wondering if there was possibly a more efficient method. I need to keep my algorithm as streamlined as possible.
Thanks in advance for any information!
CodePudding user response:
As the comments note, the only way of doing this faster than O(n) is not doing it for all elements, In fact, don't do it for any elements!
The reason is that you're writing a cryptographic algorithm. You'll use results[i]
a few lines lower. That part will likely be numerically expensive, while this XOR is limited by memory bandwidth. If you replace results[i]
with a1[i] ^ a2[i]
in the cryptographic operation, the CPU is likely to overlap the memory access and the computation.
CodePudding user response:
With AVX512 it will look like this :
//unsigned char a1[64];
//unsigned char a2[64];
//unsigned char result[64];
#include <cstdint>
#include <immintrin.h>
#include <iostream>
#include <iomanip>
union int512
{
std::uint8_t bytes[64]{};
__m512 value;
};
std::ostream& operator<<(std::ostream& os, const int512 value)
{
bool comma = false;
for (const auto& byte : value.bytes)
{
if (comma) std::cout << ", ";
std::cout << std::hex << "0x" << static_cast<int>(byte);
comma = true;
}
return os;
}
int main()
{
int512 data1;
int512 data2;
std::uint8_t n{ 0u };
for (auto& byte : data1.bytes) byte = n ;
for (auto& byte : data2.bytes) byte = 0xff;
int512 result;
result.value = _mm512_xor_ps(data1.value, data2.value);
std::cout << "data1 = " << data1 << "\n";
std::cout << "data2 = " << data2 << "\n";
std::cout << "xor = " << result << "\n";
return 0;
}
CodePudding user response:
And maybe, with luck, the good old std::valarray
will be also very fast.
#include <valarray>
std::valarray<char> a1(64);
std::valarray<char> a2(64);
std::valarray<char> result(64);
int main() {
result = a1 ^ a2;
}
But I think that a good optimizing compiler will outperform all our manual and handwriten optimization attempts . . .