Home > Back-end >  Generate edge list from from nested dictionary
Generate edge list from from nested dictionary

Time:11-09

Below is the tree as a dictionary:

output_dict = { 'Sort': [ { 'Aggregate': [ { 'Hash Join': [{ 'Hash Join': [ { 'Sequential Scan': [ ] }, { 'Hash': [ { 'Sequential Scan': [ ] } ] } ] },{ 'Hash': [ { 'Sequential Scan': [ ] } ] } ] } ] } ] }

I would like to transform the above data into a networkx compatible format. Preferably in the edge list format.

I.E.

('Sort', 'Aggregrate')
('Aggregrate', 'Hash Join'))

I have tried using a BFS template but with no success Appreciate any help given thanks

CodePudding user response:

You can use a recursive generator function:

output_dict = { 'Sort': [ { 'Aggregate': [ { 'Hash Join': [{ 'Hash Join': [ { 'Sequential Scan': [ ] }, { 'Hash': [ { 'Sequential Scan': [ ] } ] } ] },{ 'Hash': [ { 'Sequential Scan': [ ] } ] } ] } ] } ] }
def edges(d, p=None):
   for a, b in d.items():
      if p is not None: 
         yield (p,a)
      yield from (j for k in b for j in edges(k, p=a))

print(list(edges(output_dict)))

Output:

[('Sort', 'Aggregate'), ('Aggregate', 'Hash Join'), ('Hash Join', 'Hash Join'), ('Hash Join', 'Sequential Scan'), ('Hash Join', 'Hash'), ('Hash', 'Sequential Scan'), ('Hash Join', 'Hash'), ('Hash', 'Sequential Scan')]

CodePudding user response:

You could use a simple DFS like traversal through the nested dictionary:

def edge_list(dictionary):
    if not dictionary:
        return []
    edges = []
    for root in dictionary:
        if dictionary[root]:
            for children in dictionary[root]:
                edges.extend((root, child) for child in children)  # add direct child links
                edges.extend(edge_list(children))  # recursive call
    return edges


print(edge_list(output_dict))

Output:

[('Sort', 'Aggregate'), ('Aggregate', 'Hash Join'), ('Hash Join', 'Hash Join'), ('Hash Join', 'Sequential Scan'), ('Hash Join', 'Hash'), ('Hash', 'Sequential Scan'), ('Hash Join', 'Hash'), ('Hash', 'Sequential Scan')]

I think this might be what you want.

CodePudding user response:

The data format is kind of annoying with all the nested lists, really you could combine all the dictionaries in every list of yours, so I'd logically split up the function to work on a list and the name of the current node since the looping is a little easier to communicate:

def edges_from_node_and_children(root, list_of_children):
    for UNNECESSARILY_NESTED_DICT in list_of_children:
      for child, further_children in UNNECESSARILY_NESTED_DICT.items():
        # for each child produce the link between the root and this child
        yield (root, child)
        # as well as all links from the child further down
        yield from edges_from_node_and_children(child, further_children)
        
def edges(d):
    for node,children in d.items():
        yield from edges_from_node_and_children(node,children)

Then if you wanted to streamline the iterators with itertools you could use chain.from_iterable and starmap to cut out most of the loops:

import itertools
from typing import Dict,List
List_of_children_type = List[Dict[str, 'List_of_children_type']]

def edges_from_node_and_children(root:str, children: List_of_children_type):
    #for TEMP in children:
    # for child, further_children in TEMP.items():
    ## or equivelent chain iterator loop
    for child, further_children in itertools.chain.from_iterable(map(dict.items, children)):
        # for each child produce the link between the root and this child
        yield (root, child)
        # as well as all links from the child further down
        yield from edges_from_node_and_children(child, further_children)
        
def edges(d):
    return itertools.chain.from_iterable(itertools.starmap(edges_from_node_and_children, d.items()))
    ## or equivelently
    #for node,children in d.items():
    #    yield from edges_from_node_and_children(node,children)

# this has the same structure as your list but the elements are all unique so I could properly test it
output_dict = {
    'A': [
        { 'B': [
            { 'C': [
                { 'D1': [
                    { 'E1': [ ] },
                    { 'E2': [
                        { 'F': [ ] }
                ]}]},
                { 'D2': [
                    { 'Z': [ ] }
    ]}]}]}]}

# or print(list(edges(output_dict)))
print(*edges(output_dict), sep="\n")
  • Related