EDIT: TLDR; I was a victim of object slicing, which I didn't know about. Now the original question follows.
I'm trying to understand how std::vector<MyClass>
stores objects when an instance of MyDerived is push_backed into it. Also, how do iterators know where the start of the next memory block will be so that the increment
operator knows how to get there. Consider the following code sample:
#include <iostream>
#include <vector>
using namespace std;
class BaseShape
{
public:
// BaseShape() { cout << "BaseShape() "; }
virtual void draw() const { cout << "BASE?\n"; }
};
class Circle : public BaseShape
{
public:
Circle() { cout << "Circle()"; }
virtual void draw() const override { cout << "Circle!\n"; }
void *somePointer, *ptr2;
};
class Triangle : public BaseShape
{
public:
Triangle() { cout << "Triangle()"; }
virtual void draw() const override { cout << "Triangle!\n"; }
void *somePtr, *ptr2, *ptr3, *ptr4, *ptr5;
};
int main()
{
cout << "vector<BaseShape *> ";
vector<BaseShape *> pShapes{new BaseShape(), new Circle(), new Triangle(), new Circle()};
cout << endl;
for (vector<BaseShape *>::iterator it = pShapes.begin(); it != pShapes.end(); it)
{
cout << *it << " ";
(*it)->draw();
}
// vector<BaseShape *> Circle()Triangle()Circle()
// 01162F08 BASE?
// 01162F18 Circle!
// 011661A0 Triangle!
// 01162F30 Circle!
cout << "\nvector<BaseShape> ";
vector<BaseShape> shapes{BaseShape(), Circle(), Triangle(), Circle()};
cout << endl;
for (vector<BaseShape>::iterator it = shapes.begin(); it != shapes.end(); it)
{
cout << &(*it) << " ";
(*it).draw();
}
// vector<BaseShape> Circle()Triangle()Circle()
// 01162FD0 BASE?
// 01162FD4 BASE?
// 01162FD8 BASE?
// 01162FDC BASE?
return 0;
}
In vector::<BaseShape*> pShapes
, I understand that pShapes is only storing pointers to the address of the actual shape. Then, it is easy to know how much to increment the memory address with it, as all pointers will have the same memory size. Console output shows how *it
jumps around in memory for "Triangle".
Now, my doubt comes when vector<BaseShape> shapes
is used instead. Maybe my understanding is wrong, but I believe that shapes
would store memory for BaseShape objects directly (more on this later). But if that is correct, then when I push_back a Circle
or a Triangle
object into it, how is it even possible to store all objects contiguously in memory? That doesn't sound possible, as Circle
and Triangle
have different sizes in memory, and their memory must be contiguous to that of the BaseShape object (e.g. [BaseShape mem][Circle mem]). Even more, how does it
know exactly how much memory is needed to jump in order to get the next object? In the console output, I can see that it
only increased the memory address by 4, which leads me to conclude that somehow only the BaseShape part was stored in memory. Is the [Circle mem] just dropped? Because I can see the Circle constructor was called (as seen in // vector<BaseShape> Circle()Triangle()Circle()
).
I maybe was expecting the code to not compile or warn me that storing Circle or Triangle in shapes
would lead to information loss, but it didn't and the code kinda worked. The 'kinda' is because draw()
was early bound to BaseShape rather than properly late binding to Circle or Triangle as a virtual method should. This signals that shapes
is storing contiguous BaseShape memory blocks...
I'm not trying to solve a problem here, I'm just curious about how C works and where is my misunderstanding of std::vector, pointers, or iterators.
CodePudding user response:
When storing BaseShape
s in a vector by value you'll experience what is called object slicing.
Basically all information that only the derived classes contain is forgotten about, and only the base class' information is actually stored. All objects will behave as would BaseClass
objects, with the only exception of potential class invariants being broken due to the slicing.