Home > Back-end >  How to version a graph in boost graph library
How to version a graph in boost graph library

Time:12-21

I'm using the enter image description here

To me, this seems reasonably "meaningful".

Your use case, as you have described it, seems not to need anything more complex than this.

Here is your description

  1. If the family tree had a version of the family in the 1990's and one in 2010's, how could I extend the graph considering v1990 and v2010?

  2. When working with a specific version, can I isolate or exclusively use that version's data in my computations?

  3. How is serialization affected?

In my proposal:

  1. The graph is not extended. The versioning is done by a version control package such as GIT

  2. You check out the required version from the version control package

  3. You will need to design the serialization format with some care so that the differences will be most meaningful to you. You can try using a serialization function provided by the library, but will likely need to write your own to get the maximum "meaning".

If you have any more requirements than these that necessitate anything more complex, then you will need to to specify them in your question

CodePudding user response:

I would indeed reckon the data structure you are looking for is a versioning tree. E.g. one where nodes can be shared COW.

I'm not aware of things like this in the boost library.

But you can create your own data structures that have these properties. I have created one before for this answer: Transforming trees in C (look for CowTree).

Now, I'd be happy to draft something. However, your requirements are essential to the choices.

  • What trees would you need to be able to represent? I mean, are family trees expected to be strict DAG? That should be a safe assumption for biological family relations, but as soon as you include non-biological relations you could require more flexibility.

  • What kind of mutations exist to go from Treeᵥ₁ to Treeᵥ₂? If they are additive (again, as in history of biological relations) then you might be able to leverage graph adaptors:

    • subgraph, where previous versions are subgraphs of later versions (the overhead would grow linear with the number of versions)
    • filtered_graph, where you could use a "version timestamp" value with each node in the tree; your graph at a version (t) could then a view of the most recent version filtered for timestamp <= t. The big benefit would be that the overhead is constant regardless of the number of versions.

I have many examples of using either adaptor in existing StackOverflow answers, in case you want to get more inspiration.

I'll mull the options while making dinner. If I think of useful demo of either approach I'll add it tonight.

Added

I just thought of the idea to represent a graph as an append-only edge list; this would be a good match if the relations need to be versioned over time, relations never (or rarely) disappear and the contents of vertices themselves need not be versioned.

As you can see, for the example code this might be an excellent match, depending perhaps on the efficiency requirements on certain algorithms. To be future-requirement safe I wouldn't jump to this appraoch, but it may have the most merit until now.

  • Related