I'm working on a virtual machine which uses a typical Smi (small integer) encoding where integers are represented as tagged pointers. More precisely, pointers are tagged and integers are just shifted.
This is the same approach as taken by V8 and Dart: https://github.com/v8/v8/blob/main/src/objects/smi.h#L17
In our implementation we have the following code for the Smi:
// In smi.h
#include <stdint.h>
class Object {
public:
bool is_smi() const { return (reinterpret_cast<uintptr_t>(this) & 0x1) == 0; }
};
class Smi : public Object {
public:
intptr_t value() const { return reinterpret_cast<intptr_t>(this) >> 1; }
static Smi* from(intptr_t value) { return reinterpret_cast<Smi*>(value << 1); }
static Smi* cast(Object* obj) { return static_cast<Smi*>(obj); }
};
With this setup, the following function is optimized by gcc 12.1.0 and -O3
so that the 'if' is never taken when o
has the Smi value 0.
// bad_optim.cc
#include "smi.h"
void bad_optim(Object* o) {
if (!o->is_smi() || o == Smi::from(0)) {
printf("in if\n");
}
}
If I replace the 'if' line with the following code, the check works:
if (!o->is_smi() || Smi::cast(o)->value() == 0) {
I'm guessing we are hitting an undefined behavior, but it's not clear to me which one.
Furthermore, it would be good to know whether there is a flag that warns about this behavior. Alternatively, maybe there is a flag to disable this optimization.
For completeness sake, here is a main
that triggers the behavior. (Note that the bad_optim
and main
function must be compiled separately).
// main.cc
#include "smi.h"
void bad_optim(Object* o);
int main() {
Smi* o = Smi::from(0);
bad_optim(o);
return 0;
}
CodePudding user response:
It's simple: dereferencing invalid or null o
would cause UB, so after the dereference, o
supposedly can't be null.
Calling is_smi()
counts as dereferencing, even if it actually doesn't access the memory.
Make is_smi()
a free function (since this only applies to this
, not pointer parameters). I'd also make Object
an opaque struct (declared but not defined).