Is there a standard algorithm or library for representing data deltas?-CodePudding

I have a Python dictionary with fairly complex structure — multiple layers of nested values, some of which are dicts and some of which are lists. I want to represent changes to the data in a compact way that can be applied easily.

For dictionary-only values, it seems not too hard — you can just make a dict that mirrors the structure of the main data but including only the modified keys on their parents, and call a slightly modified .update() that detects a tombstone value in case you need to delete a key entirely.

But with lists involved, it seems to get a lot trickier. It seems like I'd need to come up with some kind of custom addressing scheme that needs to account for a lot of cases — you can't just naively use list indices as keys, because you need to support e.g. the removal of element 5 at the same time as an insertion between elements 2 and 3.

Additionally, if lists aren't restricted to being leaves, it's tricky to specify changes to items contained in a list while also modifying the elements of that list.

Is there a Python library that standardizes something like this? Or a standard algorithm/approach that's relatively sane to implement?

for reference, here is a function that implements what I'm looking for for dict-only data:

def update(d, u):
    for k, v in u.items():
        if v == 'del':
            del d[k]
        elif isinstance(v, collections.abc.Mapping):
            d[k] = update(d.get(k, {}), v)
        else:
            d[k] = v
    return d

>>> d = {1: 2, 3: {4: 5, 6: 7}}
>>> delta = {3: {4: 'del', 6: 8}, 9: 10}
>>> update(d, delta)
{1: 2, 3: {6: 8}, 9: 10}

CodePudding user response：

From the user's point of view, I don't think list indices are as much of an issue as you say it is. The indices will only have changed after all the manipulations are done. Use the old indices during the list's manipulation.

From the implementation's point of view, we'll have to be super careful when manipulating the list indices. What we can do is maintain an "index delta i" though the iteration, and instead of modifying l[k], we modify l[k i].

Here I'll use a dictionary for the delta, with list indices as the key, and these three possible values:

'del' to remove the item at this index;
('insert', v) to insert value v just before this index; and
v to modify the value to v at this index.

Note that 'del' and v are mutually exclusive, but 'insert' can be cumulative: it's possible to insert more than one element at the same index, and it's possible to insert elements just before an index and delete or modify the element at that index. Se we want our dict delta to be able to map a key to more than one updates; i.e., to map a key to a list.

from operator import itemgetter

def update(d, u):
    if isinstance(d, dict):
        return update_dict(d, u)
    elif isinstance(d, list):
        return update_list(d, u)

def update_dict(d, u):
    for k, v in u.items():
        if v == 'del':
            del d[k]
        elif isinstance(v, dict):
            d[k] = update(d.get(k, {}), v)
        else:
            d[k] = v
    return d

def update_list(d, u):
    i = 0
    for k, v in sorted(u.items(), key=itemgetter(0)):
        if isinstance(v, list):
            for x in v:
                i = update_list_once(d, i, k, x)
        else:
            i = update_list_once(d, i, k, v)
    return d

def update_list_once(d, i, k, v):
    if v == 'del':
        del d[k i]
        i -= 1
    elif isinstance(v, tuple) and len(v) == 2 and v[0] == 'insert':
        d.insert(k i, v[1])
        i  = 1
    else:
        if isinstance(v, dict):
            d[k   i] = update(d[k i], v)
        else:
            d[k i] = v
    return i

Testing:

d = {1: 2, 3: {4: [0, 1, 2, 3, 4, 5], 6: 7}}
delta = {3: {4: {0: 'fizzbuzz', 3: 'fizz', 4: [('insert', 3.5), 4.001], 5: 'buzz'}, 6: 8}, 9: 10}
d = update(d, delta)
print(d)
# {1: 2, 3: {4: ['fizzbuzz', 1, 2, 'fizz', 3.5, 4.001, 'buzz'], 6: 8}, 9: 10}