Is it possible to compute a hash by small chunks?-CodePudding

I am currently working on a firmware update mechanism for embedded system with low memories.

Part of the system requires to hash (with SHA-256) the binary file before sending the file (other security features are added but do not impact this question). The device must verify this hash before validating it, but is pretty low on memory. It will receive the data in small chunks, and I am wondering if it is possible to compute partial hash "on the fly", to avoid loading the whole binary again after the full transfer.

As an example, let's say the data to hash is "part1part2part3". The hash of the full data is "hash", the hash of "part1" is "hash1", the hash of "part2" is "hash2" and the hash of "part3" is "hash3".

Is there any mathematical operation I can do to convert partial hashes to the full one? Something like

hashReceived = hash
tempHash = operation(hash1,hash2)
tempHash = operation(tempHash, hash3)
if(hashReceived == tempHash)
... continue
else
... fail

I am looking for a mathematical property (something like the distributive property) of SHA-256 that could allow such behavior without breaking any of the SHA-256 properties.

CodePudding user response：

The way you've described this is not possible. You cannot combine "subhashes" to determine a full hash. If you could do that, the hash would be subject to length-extension attacks and not be secure. ("Secure" here is defined in a fairly precise and technical way. See the link from Stef about other hashing approaches that loosen this requirement.)

But, as the question's comments note, it is completely possible to stream data into SHA256 without ever holding all the data in memory. That is the normal way that hash functions are computed. SHA256 works on a block size of 64 bytes. That is all the data you need to hold at a time, plus 32 bytes of state.

Most common hashing libraries have this as part of the API. It generally looks something like:

hasher = create_hasher()
update_hash(hasher, data1)
update_hash(hasher, data2)
update_hash(hasher, data3)
final_hash = compute_hash(hasher)

The hasher in this example mutates its internal state every time update_hash is called, and then finalizes the hash when compute_hash is called. Calls to update_hash() do not allocate any new memory, and there's no need to hold onto the data packets after they've been used to update the hash.