C casting: how does casting really works-CodePudding

I'm trying to understand how casting between base & derived types exactly works in C . So I wrote a small proof-of-concept program

class Foo {
public:
  Foo() {}
};

// Bar is a subclass of Foo
class Bar : public Foo {
public:
  Bar() : Foo() {}
  void bar() { std::cout << "bar" << std::endl; }
  void bar2() { std::cout << "bar with " << i << std::endl; }

private:
  int i = 0;
};

where Foo is the base and Bar is derived from Foo.

Currently, my understandings of casting are:

Cast is a runtime thing. Compiler can do us a favor by checking them during compilation, but the actual type conversion occurs during runtime
Upcast (e.g. Foo f = Bar()), either explicit or implicit, should be always fine
Downcast (e.g. Bar b = Foo()) is prohibited in C , although we can enforce the cast by using static_cast

I write 3 different programs to verify my understandings. Each program is compiled using

g   -std=c  17 -Wall -Wextra -pedantic

Case #1

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  return 0;
}

Code compiles successfully. Running is program will not result in any error

My thoughts: ok, although the actual casting is not right as we are treating a instance of Foo as Bar at runtime, we are not seeing any error because we don't really operate on b

Case #2

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  b.bar();
  return 0;
}

Code compiles successfully. Running this program will not result in any error, and "bar" is printed

I start to be confused here: why this program ever works and "bar" gets printed? Although here we are treating a Foo as Bar, the underlying instance is still a Foo and it has no method named "bar" defined on it. How could this code works?

Case #3

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  b.bar2();
  return 0;
}

Code compiles successfully. Running this program will not result in any error, and "bar with 1981882368" (some random number) is printed

I'm even more confused here: if we think in terms of memory layout, the underlying Foo instance has no space reserved for member i which is defined in Bar. How could this code still works?

Please help me understand the programs above! Thanks in advance!

CodePudding user response：

Cast is a runtime thing. Compiler can do us a favor by checking them during compilation, but the actual type conversion occurs during runtime

No, with exception of dynamic_cast, all casts are pure compile-time constructs. After all, there are (almost) no types at runtime in C .

Upcast (e.g. Foo f = Bar()), either explicit or implicit, should be always fine

Yes, upcasts are safe.

Downcast (e.g. Bar b = Foo()) is prohibited in C , although we can enforce the cast by using static_cast

No, it is not prohibited, there are just some non-trivial rules. Some casts/conversions are implicit, some must be requested explicitly.

Case 1 : This is undefined behaviour(UB) because b does not point to a real Bar object. This cast assumes, the user know what they are doing, perhaps because they have some external information about the true type of the object, although not the case here.

Case 2 : You have triggered the UB, anything can happen. In this case, the compiler likely just called bar and passed b as this pointer. That is of course incorrect but that is your problem. Since the method does not use this pointer, there is not much to break.

Case 3 : Well, now you are really digging into this UB, the compiler likely just calculated this offsetof(Bar,i) and assumed the address points to an integer. The fact that it does not is your problem for breaking the promise of downcasting to the correct type.

CodePudding user response：

I gather this question is about how this was actually able to happen, more than about what the C language standard says should happen.

So let's look at what happened, in an example program compiled for x64 without optimization (same as you did):

#include <iostream>

class Foo {
public:
  Foo() {}
};

// Bar is a subclass of Foo
class Bar : public Foo {
public:
  Bar() : Foo() {}
  void bar() { std::cout << "bar" << std::endl; }
  void bar2() { std::cout << "bar with " << i << std::endl; }

private:
  int i = 0;
};

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  b.bar2();
  return 0;
}

Relevant parts in assembly:

_ZN3Bar4bar2Ev:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     esi, OFFSET FLAT:.LC0
        mov     edi, OFFSET FLAT:_ZSt4cout
        call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
        mov     rdx, rax
        mov     rax, QWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rax]
        mov     esi, eax
        mov     rdi, rdx
        call    _ZNSolsEi
        mov     esi, OFFSET FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
        mov     rdi, rax
        call    _ZNSolsEPFRSoS_E
        nop
        leave
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        lea     rax, [rbp-9]
        mov     rdi, rax
        call    _ZN3FooC1Ev
        lea     rax, [rbp-9]
        mov     QWORD PTR [rbp-8], rax
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    _ZN3Bar4bar2Ev
        mov     eax, 0
        leave
        ret

Important things to see here are:

The call to bar2 is written as call _ZN3Bar4bar2Ev. Nothing about that physically requires an instance of Bar, methods "belonging to" classes is a high-level illusion, it's not as if they're actually packaged inside there in any real sense. There is really just a function with a funny (mangled) name, and it expects a pointer to an objects of appropriate type as a hidden parameter, but you can go ahead and violate its expectations. Of course, unexpected things may happen when you do that, since bar2 is just going to forge ahead blindly, regardless of what junk it receives as its implicit this-parameter.
By the way, things would be a bit different with a virtual call. Even they do not rely on the method name though, and also won't check whether the object that you're calling the method on actually has a sensible type. I won't go too deeply into virtual calls since they were not part of the question, you can read some other QAs such as How are virtual functions and vtable implemented?.
bar2 accesses the member i like this: mov eax, DWORD PTR [rax], ie it loads a 4-byte quantity from an offset of zero from whatever address that it received (whatever bar2 receives as its hidden first parameter, even if it is not an address, will be used by that mov as if it is an address). No types are involved, no member names are involved, no checks are made. Memory is accessed blindly, and whatever happens, happens.

This is all quite tame - even though various rules were broken, the "default thing" (proceeding as if nothing was wrong and letting the results be whatever "naturally" happens) happened anyway. That is somewhat common (but not universal, and not guaranteed) when compiling without optimizations. It may even happen when compiling with optimizations, but then you're more likely to see various compiler shenanigans.