I have a file of jsonlines that contains items with node as the key and as a value a list of the other nodes it is connected to. To add the edges to a networkx graph, -I think- requires tuples of the form(u,v). I wrote a naive solution for this but I feel it might be a bit slow for big enough jsonl files does anyone got a better, more pythonic solution to suggest?

dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
for node in dol:
    #print(node)
    tpls = []
    key = list(node.keys())[0]
    tpls = [(key,v) for v in node[key]]
    print(tpls)

<iterate through each one in the list to add them to the graph>

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
[(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]

CodePudding user response：

dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]

def process(item: dict):
    for key, values in item.items():
        for i in values:
            yield (key, i) 

results = map(process, dol)
print([list(r) for r in results])

I think you should use yield where you can.

I dont know how big your dataset is, but you will find its more memory efficient when you are using yield and getting a generator that you can iterate over.

Generators are more memory efficient.

Happy coding

CodePudding user response：

Only one key

If the dict never have more than one item, you can do this:

dol = [{0: [1, 2, 3, 4, 5, 6]}, {1: [0, 2, 3, 4, 5, 6]}]

for node in dol:
    local_node = node.copy()  # only if dict shouldn't be modified in any way
    k, values = local_node.popitem()
    print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]

Multiple keys

But if a dict may contains more than one value, you can do a while loop and test if the dict is not empty:

for node in dol:
    local_node = node.copy()  # only if dict shouldn't be modified in any way
    while local_node:
        k, values = local_node.popitem()
        print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(2, 0), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]

Of course, if you need to store the generated list, append it to a list instead of just printing it.

Only one big dictionary

If your dol object can be a single dictionary, it's even simpler and if, as Yves Daoust said, you need an adjacency list or matrix, here is two example:

Adjacency list pure python

An adjacency list:

dol = {0: [1, 2, 3, 4, 5, 6],
       1: [0, 2, 3, 4, 5, 6]}

adjacency_list = [(key, value) for key, values in dol.items() for value in values]
print(adjacency_list)
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]

Adjacency matrix with pandas

An adjacency_matrix:

import pandas
dol = {0: [1, 2, 3, 4, 5, 6],
       1: [0, 2, 3, 4, 5, 6]}

adjacency_list = [(key, value) for key, values in dol.items() for value in values]
adjacency_df = pandas.DataFrame(adjacency_list)
adjacency_matrix = pandas.crosstab(adjacency_df[0], adjacency_df[1],
                                   rownames=['keys'], colnames=['values'])
print(adjacency_matrix)
# values  0  1  2  3  4  5  6
# keys                       
# 0       0  1  1  1  1  1  1
# 1       1  0  1  1  1  1  1

CodePudding user response：

You could use a list comprehension:

dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]

tuples = [ (n1,n2) for d in dol for n1,ns in d.items() for n2 in ns ]

print(tuples)

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2), 
 (1, 3), (1, 4), (1, 5), (1, 6)]