I'm trying to understand how casting between base & derived types exactly works in C . So I wrote a small proof-of-concept program
class Foo {
public:
Foo() {}
};
// Bar is a subclass of Foo
class Bar : public Foo {
public:
Bar() : Foo() {}
void bar() { std::cout << "bar" << std::endl; }
void bar2() { std::cout << "bar with " << i << std::endl; }
private:
int i = 0;
};
where Foo is the base and Bar is derived from Foo.
Currently, my understandings of casting are:
- Cast is a runtime thing. Compiler can do us a favor by checking them during compilation, but the actual type conversion occurs during runtime
- Upcast (e.g.
Foo f = Bar()
), either explicit or implicit, should be always fine - Downcast (e.g.
Bar b = Foo()
) is prohibited in C , although we can enforce the cast by usingstatic_cast
I write 3 different programs to verify my understandings. Each program is compiled using
g -std=c 17 -Wall -Wextra -pedantic
Case #1
int main() {
Foo f;
Bar &b = static_cast<Bar &>(f);
return 0;
}
Code compiles successfully. Running is program will not result in any error
My thoughts: ok, although the actual casting is not right as we are treating a instance of Foo as Bar at runtime, we are not seeing any error because we don't really operate on b
Case #2
int main() {
Foo f;
Bar &b = static_cast<Bar &>(f);
b.bar();
return 0;
}
Code compiles successfully. Running this program will not result in any error, and "bar" is printed
I start to be confused here: why this program ever works and "bar" gets printed? Although here we are treating a Foo as Bar, the underlying instance is still a Foo and it has no method named "bar" defined on it. How could this code works?
Case #3
int main() {
Foo f;
Bar &b = static_cast<Bar &>(f);
b.bar2();
return 0;
}
Code compiles successfully. Running this program will not result in any error, and "bar with 1981882368" (some random number) is printed
I'm even more confused here: if we think in terms of memory layout, the underlying Foo instance has no space reserved for member i
which is defined in Bar. How could this code still works?
Please help me understand the programs above! Thanks in advance!
CodePudding user response:
Cast is a runtime thing. Compiler can do us a favor by checking them during compilation, but the actual type conversion occurs during runtime
No, with exception of dynamic_cast
, all casts are pure compile-time constructs. After all, there are (almost) no types at runtime in C .
Upcast (e.g.
Foo f = Bar()
), either explicit or implicit, should be always fine
Yes, upcasts are safe.
Downcast (e.g.
Bar b = Foo()
) is prohibited in C , although we can enforce the cast by usingstatic_cast
No, it is not prohibited, there are just some non-trivial rules. Some casts/conversions are implicit, some must be requested explicitly.
Case 1 : This is undefined behaviour(UB) because b
does not point to a real Bar
object. This cast assumes, the user know what they are doing, perhaps because they have some external information about the true type of the object, although not the case here.
Case 2 : You have triggered the UB, anything can happen. In this case, the compiler likely just called bar
and passed b
as this
pointer. That is of course incorrect but that is your problem. Since the method does not use this
pointer, there is not much to break.
Case 3 : Well, now you are really digging into this UB, the compiler likely just calculated this offsetof(Bar,i)
and assumed the address points to an integer. The fact that it does not is your problem for breaking the promise of downcasting to the correct type.
CodePudding user response:
I gather this question is about how this was actually able to happen, more than about what the C language standard says should happen.
So let's look at what happened, in an example program compiled for x64 without optimization (same as you did):
#include <iostream>
class Foo {
public:
Foo() {}
};
// Bar is a subclass of Foo
class Bar : public Foo {
public:
Bar() : Foo() {}
void bar() { std::cout << "bar" << std::endl; }
void bar2() { std::cout << "bar with " << i << std::endl; }
private:
int i = 0;
};
int main() {
Foo f;
Bar &b = static_cast<Bar &>(f);
b.bar2();
return 0;
}
Relevant parts in assembly:
_ZN3Bar4bar2Ev:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov rdx, rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov esi, eax
mov rdi, rdx
call _ZNSolsEi
mov esi, OFFSET FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
mov rdi, rax
call _ZNSolsEPFRSoS_E
nop
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-9]
mov rdi, rax
call _ZN3FooC1Ev
lea rax, [rbp-9]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov rdi, rax
call _ZN3Bar4bar2Ev
mov eax, 0
leave
ret
Important things to see here are:
- The call to
bar2
is written ascall _ZN3Bar4bar2Ev
. Nothing about that physically requires an instance ofBar
, methods "belonging to" classes is a high-level illusion, it's not as if they're actually packaged inside there in any real sense. There is really just a function with a funny (mangled) name, and it expects a pointer to an objects of appropriate type as a hidden parameter, but you can go ahead and violate its expectations. Of course, unexpected things may happen when you do that, sincebar2
is just going to forge ahead blindly, regardless of what junk it receives as its implicitthis
-parameter.
By the way, things would be a bit different with a virtual call. Even they do not rely on the method name though, and also won't check whether the object that you're calling the method on actually has a sensible type. I won't go too deeply into virtual calls since they were not part of the question, you can read some other QAs such as How are virtual functions and vtable implemented?. bar2
accesses the memberi
like this:mov eax, DWORD PTR [rax]
, ie it loads a 4-byte quantity from an offset of zero from whatever address that it received (whateverbar2
receives as its hidden first parameter, even if it is not an address, will be used by thatmov
as if it is an address). No types are involved, no member names are involved, no checks are made. Memory is accessed blindly, and whatever happens, happens.
This is all quite tame - even though various rules were broken, the "default thing" (proceeding as if nothing was wrong and letting the results be whatever "naturally" happens) happened anyway. That is somewhat common (but not universal, and not guaranteed) when compiling without optimizations. It may even happen when compiling with optimizations, but then you're more likely to see various compiler shenanigans.