I was studying SHA2 today, and got to the place in code which I do no understand.
RFC 4634 (where SHA2 is defined) defines two variables, Length_Low
and Length_High
, in the sha.h
file:
uint32_t Length_Low; /* Message length in bits */
uint32_t Length_High; /* Message length in bits */
In sha224-256.c
file the variable Length_Low
is actively changed at two places, for example. Here:
/*
* add "length" to the length
*/
static uint32_t addTemp;
#define SHA1AddLength(context, length) \
(addTemp = (context)->Length_Low, \
(context)->Corrupted = \
(((context)->Length_Low = (length)) < addTemp) && \
( (context)->Length_High == 0) ? 1 : 0)
and here:
/*
* Store the message length as the last 8 octets
*/
context->Message_Block[56] = (uint8_t) (context->Length_High >> 24);
context->Message_Block[57] = (uint8_t) (context->Length_High >> 16);
context->Message_Block[58] = (uint8_t) (context->Length_High >> 8);
context->Message_Block[59] = (uint8_t) (context->Length_High);
context->Message_Block[60] = (uint8_t) (context->Length_Low >> 24);
context->Message_Block[61] = (uint8_t) (context->Length_Low >> 16);
context->Message_Block[62] = (uint8_t) (context->Length_Low >> 8);
context->Message_Block[63] = (uint8_t) (context->Length_Low);
I understood RFC 4634 and the principles of SHA2. However, being a beginner in C I can understand only 99% of the code from the paper. The code fragments I attached belong to this remaining 1%.
Could you please explain me, what purpose the Length_Low
and Length_High
variables play in the implementation of SHA2? What is their meaning on the code level?
Secondly, what happens in the code fragments? I can identify compound operators and shifts, but I am overwhelmed by the difficulty of the code, - especially in the second fragment, where de-referencing, increment, definition, etc. happen in the same line of code.
CodePudding user response:
Meta: stackexchange, and especially stackoverflow, policy is not to post images of 'code', defined expansively to include things like config files and log or error messages, because they are very hard to read on mobile devices, impossible to read by visually impaired people, not cut&pastable, and not searchable. Plus yours, reformatted and colored I presume by your IDE, are to my taste exceptionally ugly. Fortunately all RFCs are published in text form, which I could easily substitute.
Also, an aside: the input block size for SHA-256 and SHA-224 (and SHA-1 and MD5 before them) is 64 octets or 512 bits, not 64 bits, and anyway has nothing to do with the length field. The length field is indeed 64 bits and implemented in the code as two 32-bit variables for the high and low half.
static uint32_t addTemp;
#define SHA224_256AddLength(context, length) \
(addTemp = (context)->Length_Low, (context)->Corrupted = \
(((context)->Length_Low = (length)) < addTemp) && \
( (context)->Length_High == 0) ? 1 : 0)
is rather tricky code to add the length of a piece of input data (as actually used always 8 for a full octet or 1-7 for leftover bits) to the 2x32-bit length field. First it saves the incoming Length_Low
in addTemp
, and then, from the inside out:
(context)->Length_Low = (length) // call this CODE1
adds the value of length
to the Length_Low
field in the structure; because this uses unsigned arithmetic in C, if the result (mathematically) overflows it is wrapped around (taken modulo 232). Thus the sum (the new value in Length_low
) is smaller than the original value in addTemp
if and only if overflow/wraparound occurred. This is tested and in that case the Length_High
field is incremented:
( CODE1 < addTemp) && ( (context)->Length_High == 0) // call this CODE2
If after incrementing the high half is zero, that means it also overflowed/wrapped-around, which means the real message length is too big to fit in a 64-bit field, as the spec requires, so this is considered an error and stored in the Corrupted
field, which will be tested later to report that the hashing operation failed:
(addTemp=..., (context)->Corrupted = CODE2 ? 1 : 0)
It should be noted the ? 1 : 0
is technically unnecessary; the &&
and ||
operators in C (and also the comparison/equality operators like <
and ==
) are defined to return one for true and zero for false already (although tests like if(x)
and while(x)
accept any nonzero value as true). However some people feel writing this out is clearer, and that motivation is especially strong in an RFC which is published to a wide audience including those (like you!) with little knowledge of C.
In fact, it might have been better to write this as a (very small) function instead of a macro, which would allow use of more obvious statements instead of complicated nested expressions, and which any decent compiler in 2006 (much less now) would inline and fold to produce the same code as the macro. But the world isn't perfect.
Your third chunk, in comparison, is quite simple. It just takes the 2x32-bit length field and stores it as a big-endian sequence of 8 8-bit units in the last 8 elements of the current Message_Block
.