Home > Net >  Reading an array of bytes into UTF-16 characters on a machine with a specific UTF-16 character size
Reading an array of bytes into UTF-16 characters on a machine with a specific UTF-16 character size

Time:09-27

I have a question about utf16_t character interaction and SHA-256 generation with OpenSSL.

The thing is, I'm currently writing code that should deal with password hashing. I've generated a 256-bit hash, and I want to throw it into the database in a UTF-16 encoded character field. In my C code, I use char16_t to store such data. However, there is a problem. utf16_t can have more than 16 bytes, depending on the machine it ends up on. And if I use memcpy() to copy bytes from my SHA-256 hash, it may turn out to be a mess on some machines.

What should I do in this situation? Read bytes differently, store hashes in the database differently, maybe something else?

CodePudding user response:

SHA256 generates 256 essentially random bits (32 bytes) of data. It will not always generate valid UTF-16 data.

You need to somehow encode the 32 bytes into more-than-32 utf-16 bytes to store in your database. Or you can convert the database field to a proper 256-bit binary type

One of the easier-to-implement ways to store it in your DB as a string would be to map each byte to a character 1-to-1 (and store 32 bytes of data with 32 bytes of zeroes in between):

unsigned char sha256_hash[256/8];
get_hash(sha256_hash);
// encoding
char16_t db_data[256/8];
for (int i = 0; i < std::size(db_data);   i) {
    db_data[i] = char16_t(sha256_hash[i]);
}
write_to_db(db_data);


char16_t db_data[256/8];
read_from_db(db_data);
// decoding
unsigned char sha256_hash[256/8];
for (int i = 0; i < std::size(sha256_hash);   i) {
    assert((std::uint16_t) db_data[i] <= 0xFF);
    sha256_hash[i] = (unsigned char) db_data[i];
}

Be careful if you are using null-terminated strings though. You will need an extra character for the null terminator and map the 0 byte to something else (0x100 would be a good choice).

But if you have additional requirements (like it being readable characters), you might consider base64 or a hexadecimal encoding

  • Related