Context
I have a char
variable on which I need to apply a transformation (for example, add an offset). The result of the transformation may or may not overflow.
I don't really care of the actual value of the variable after the transformation is performed.
The only guarantee I want to have is that I must be able to retrieve the original value if I perform the transformation again but in the opposite way (for example, substract the offset).
Basically:
char a = 42;
a = 140; // overflows (undefined behaviour)
a -= 140; // must be equal to 42
Problem
I know that signed
types overflow is undefined behaviour but it's not the case for unsigned
types overflows. I have then chosen to add an intermediate step in the process to perform the conversion.
It would then become:
char
->unsigned char
conversion- Apply the tranformation (resp. the reversed transformation)
unsigned char
->char
conversion
This way, I have the garantee that the potential overflow will only occur for an unsigned
type.
Question
My question is, what is the proper way to perform such a conversion ?
Three possibilities come in my mind. I can either:
- implicit conversion
static_cast
reinterpret_cast
Which one is valid (not undefined behaviour) ? Which one should I use (correct behaviour) ?
My guess is that I need to use reinterpret_cast
since I don't care of actual value, the only guarantee I want is that the value in memory remains the same (i.e. the bits don't change) so that it can be reversible.
On the other hand, I'm not sure if the implicit conversion or the static_cast
won't trigger undefined behaviour in the case where the value is not representable in the destination type (out of range).
I couldn't find anything explicitly stating it is or is not undefined behaviour, I just found this Microsoft documentation where they did it with implicit conversions without any mention of undefined behaviour.
Here is an example, to illustrate:
char a = -4; // out of unsigned char range
unsigned char b1 = a; // (A)
unsigned char b2 = static_cast<unsigned char>(a); // (B)
unsigned char b3 = reinterpret_cast<unsigned char&>(a); // (C)
std::cout << (b1 == b2 && b2 == b3) << '\n';
unsigned char c = 252; // out of (signed) char range
char d1 = c; // (A')
char d2 = static_cast<char>(c); // (B')
char d3 = reinterpret_cast<char&>(c); // (C')
std::cout << (d1 == d2 && d2 == d3) << '\n';
The output is:
true
true
Unless undefined behaviour is triggered, the three methods seem to work.
Are (A) and (B) (resp. (A') and (B')) undefined behaviour if the value is not representable in the destination type ?
Is (C) (resp. (C')) well defined ?
CodePudding user response:
I know that signed types overflow is undefined behaviour,
True, but does not apply here.
a = 140;
is not signed integer overflow, not UB. That is like a = a 140;
a 140
does not overflow when a
is 8-bit signed char
or unsigned char
.
The issue is what happens when the sum a 140
is out of char
range and assigned to a char
.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. C17dr § 6.3.1.3 3
It is implementation defined behavior, when char
is signed and 8-bit - to assign a value outside the char
range.
Usually the implementation defined behavior is a wrap and fully defined so a = 140;
is fine as is.
Alternatively the implementation defined behavior might have been to cap the value to the char
range when char
is signed.
char a = 42;
a = 140;
// Might act as if
a = max(min(a 140, CHAR_MAX), CHAR_MIN);
a = 127;
To avoid implementation defined behavior, perform the
or -
on a
accessed as a unsigned char
*((unsigned char *)&a) = small_offset;
Or just use unsigned char a
and avoid all this. unsigned char
is defined to wrap.
CodePudding user response:
For full portability, you do have a small problem insofar as (except for char
1) signed data types have not been2 required to have as many distinct values as their unsigned counterparts. Very few systems actually used sign-magnitude representation for integral types, but if you cannot rule them out, then simply doing the math in the unsigned counterpart does not actually guarantee round-tripping, even if you use numeric_limits<?>::min()
to try to avoid conversion of unrepresentable values.
With that caveat out of the way, the direct answer to your question is that both implicit conversion and static_cast
are correct (and equivalent) for converting a value between its signed and unsigned counterpart types. In the signed->unsigned direction, the behavior is well-defined by the Standard, while in the other direction the behavior is implementation-defined.
1 char
and signed char
themselves are rescued from this possibility by their endorsement for access to the byte representation of any object, including to unsigned
objects which are required not to have any missing values.
2 Two's complement conversion behavior is required in the latest version of C , see https://eel.is/c draft/basic.fundamental#3