Home > Blockchain >  Boost serialization with recursive data structure ends up in stack overflow
Boost serialization with recursive data structure ends up in stack overflow

Time:12-01

Problem

I'm having trouble using Boost Serialization library with a recursive data structure. More precisely, I want to serialize a matrix which is represented by nodes containing a value, and where each node has access to its neighboors (top,bottom,left,right). In order to access a node, each entry point is stored in a vector (that is the first and the last node of each row and and each column) . Here is the Node class :

class Node
{
private:
    int v;
    Node* left; 
    Node* right;
    Node* top;
    Node* bottom;

public:
    Node() : v(rand() % 100), left(NULL), right(NULL), top(NULL), bottom(NULL)
    {

    }

    //Potential memory leak but that's not the topic
    void setLeft(Node* toSet) { left = toSet; }
    void setRight(Node* toSet) { right = toSet; }
    void setTop(Node* toSet) { top = toSet; }
    void setBottom(Node* toSet) { bottom = toSet; }

    Node* gLeft() { return left; }
    Node* gRight() { return right; }
    Node* gTop() { return top; }
    Node* gBottom() { return bottom; }

    int gValue() { return v; }

    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& v;
        ar& left;
        ar& right;
        ar& top;
        ar& bottom;
    }
};

Here is the Matrix class, with a generateValues() function for this example :

class Matrix
{
private:
    int m, n;
    std::vector<Node*> firstNodesPerRow;
    std::vector<Node*> lastNodesPerRow;
    std::vector<Node*> firstNodesPerCol;
    std::vector<Node*> lastNodesPerCol;
public:
    Matrix(int m, int n) : 
        m(m), n(n),
        firstNodesPerRow(m, NULL), lastNodesPerRow(m,NULL),
        firstNodesPerCol(n, NULL),lastNodesPerCol(n, NULL)
    {
        
    }

    void generateValues()
    {
        for (int i = 0; i < m; i  )
        {
            for (int j = 0; j < n; j  )
            {
                Node* toWrite = new Node();
                if (i > 0)
                {
                    toWrite->setTop(lastNodesPerCol.at(j));
                    lastNodesPerCol.at(j)->setBottom(toWrite);
                    lastNodesPerCol.at(j) = toWrite;
                }
                else
                {
                    firstNodesPerCol.at(j) = toWrite;
                    lastNodesPerCol.at(j) = toWrite;
                }
                if (j > 0)
                {
                    toWrite->setLeft(lastNodesPerRow.at(i));
                    lastNodesPerRow.at(i)->setRight(toWrite);
                    lastNodesPerRow.at(i) = toWrite;
                }
                else
                {
                    firstNodesPerRow.at(i) = toWrite;
                    lastNodesPerRow.at(i) = toWrite;
                }
            }
        }
    }

    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& m;
        ar& n;
        ar& firstNodesPerRow;
        ar& firstNodesPerCol;
        ar& lastNodesPerRow;
        ar& lastNodesPerCol;
    }
};

So what I want to achieve is serializing and deserializing a Matrix. Here is my main function :

#include <cstdlib>
#include <sstream>
#include <vector>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/vector.hpp>

int main(int argc, char* argv[])
{
    int m = 10; int n = 10;
    Matrix toSerialize(m,n);
    toSerialize.generateValues();

    /*
        1) Serialize
    */
    std::ostringstream oss;
    boost::archive::text_oarchive oa(oss);
    oa << toSerialize;

    std::string serialiazedData = oss.str();

    /*
        2) Deserialize
    */
    Matrix result(m,n);
    std::stringstream serializedDataStream(serialiazedData);
    boost::archive::text_iarchive ia(serializedDataStream);
    ia >> result;

    return EXIT_SUCCESS;
}

The problem is the following : given a sufficiently large value m or n, main ends up with a stack-overflow exception. I know that's it's coming from the serialize method of Node, because in order to serialize a node, it needs to serialize the neighboors and so on... I found Matrix example

Edit 2 : The solution I had in mind (not working, ends up with a stack-overflow when deserializing).

//Class Node
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& v;
}

//Class Matrix
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& m;
    ar& n;

    for (int i = 0; i < m; i  )
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            ar& current;
            current = current->gRight();
        }
    }

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;
}

Solution

The explanation of the solution is given in the post marked as the answer. Here is an implementation of this solution :

// class Node
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& v;
}

// some buffer struct
struct Neighbors
{
    Node* top;
    Node* bottom;
    Node* left;
    Node* right;

    template <typename Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& top;
        ar& bottom;
        ar& left;
        ar& right;
    }
};

//class Matrix
template<class Archive>
void save(Archive& ar, const unsigned int version) const
{
    std::map<Node*, Neighbors> neighborsPerNode;

    for (int i = 0; i < m; i  )
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            neighborsPerNode[current] = {
                current->gTop(),
                current->gBottom(),
                current->gLeft(),
                current->gRight(),
            };
            current = current->gRight();
        }
    }

    ar& neighborsPerNode;

    ar& m;
    ar& n;

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;
}
template<class Archive>
void load(Archive& ar, const unsigned int version)
{
    // Warning ALL the nodes are browsed 2 times :
    //  1 - to deserialize neighborsPerNode (below)
    //  2 - in the for loop, to create the links between the nodes
    
    std::map<Node*, Neighbors> neighborsPerNode;
    ar& neighborsPerNode;

    ar& m;
    ar& n;

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;

    for (int i = 0; i < m; i  )
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            Neighbors myNeighbors = neighborsPerNode.at(current);
            current->setTop(myNeighbors.top);
            current->setBottom(myNeighbors.bottom);
            current->setLeft(myNeighbors.left);
            current->setRight(myNeighbors.right);
            current = current->gRight();    
        }
    }
}
BOOST_SERIALIZATION_SPLIT_MEMBER()

CodePudding user response:

Why not simply serialize all elements in order of the matrix and avoid the function call recursion entirely, e.g.:

template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& m;
    ar& n;
    for (int i = 0; i < m;   i)
    {
        Node* node = firstNodesPerRow[i];
        for (int j = 0; j < n;   j)
        {
            ar & node->gValue();
            node = node->gRight();
        }
    }
}

BTW This works in saving the matrix. I think you need to specialize the serialize function in save and load versions, because for the load version you need to:

  1. load n, m
  2. allocate all nodes and populate the node pointer vectors in matrix
  3. load all values in the same order as during saving
    template<class Archive>
    void save(Archive & ar, const unsigned int version) const
    {
        ...
    }
    template<class Archive>
    void load(Archive & ar, const unsigned int version)
    {
        ...
    }
    BOOST_SERIALIZATION_SPLIT_MEMBER()

In the complicated case where nodes may be missing, the same idea applies. And you still need to split between save/load, adding allocation to load. But it takes a lot more bookkeeping to correctly save & load again.

E.g., you could first iterate over all nodes as above but creating a map from each Nodes pointer value to an unique increasing ID number. When you save each node's value (as above row by row), also save each direction's node ID. When you load, you first make a empty map: ID -> node pointer. Then restore nodes again row by row, while restoring neighbour pointers as well from map. Of course whenever an ID is first encountered you need to allocate a new node.

  • Related