I have a file of jsonlines that contains items with node as the key and as a value a list of the other nodes it is connected to. To add the edges to a networkx graph, -I think- requires tuples of the form(u,v). I wrote a naive solution for this but I feel it might be a bit slow for big enough jsonl files does anyone got a better, more pythonic solution to suggest?
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
for node in dol:
#print(node)
tpls = []
key = list(node.keys())[0]
tpls = [(key,v) for v in node[key]]
print(tpls)
<iterate through each one in the list to add them to the graph>
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
[(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
CodePudding user response:
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
def process(item: dict):
for key, values in item.items():
for i in values:
yield (key, i)
results = map(process, dol)
print([list(r) for r in results])
I think you should use yield where you can.
I dont know how big your dataset is, but you will find its more memory efficient when you are using yield and getting a generator that you can iterate over.
Generators are more memory efficient.
Happy coding
CodePudding user response:
Only one key
If the dict never have more than one item, you can do this:
dol = [{0: [1, 2, 3, 4, 5, 6]}, {1: [0, 2, 3, 4, 5, 6]}]
for node in dol:
local_node = node.copy() # only if dict shouldn't be modified in any way
k, values = local_node.popitem()
print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
Multiple keys
But if a dict may contains more than one value, you can do a while loop and test if the dict is not empty:
for node in dol:
local_node = node.copy() # only if dict shouldn't be modified in any way
while local_node:
k, values = local_node.popitem()
print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(2, 0), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
Of course, if you need to store the generated list, append it to a list instead of just printing it.
Only one big dictionary
If your dol object can be a single dictionary, it's even simpler and if, as Yves Daoust said, you need an adjacency list or matrix, here is two example:
Adjacency list pure python
An adjacency list:
dol = {0: [1, 2, 3, 4, 5, 6],
1: [0, 2, 3, 4, 5, 6]}
adjacency_list = [(key, value) for key, values in dol.items() for value in values]
print(adjacency_list)
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
Adjacency matrix with pandas
An adjacency_matrix:
import pandas
dol = {0: [1, 2, 3, 4, 5, 6],
1: [0, 2, 3, 4, 5, 6]}
adjacency_list = [(key, value) for key, values in dol.items() for value in values]
adjacency_df = pandas.DataFrame(adjacency_list)
adjacency_matrix = pandas.crosstab(adjacency_df[0], adjacency_df[1],
rownames=['keys'], colnames=['values'])
print(adjacency_matrix)
# values 0 1 2 3 4 5 6
# keys
# 0 0 1 1 1 1 1 1
# 1 1 0 1 1 1 1 1
CodePudding user response:
You could use a list comprehension:
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
tuples = [ (n1,n2) for d in dol for n1,ns in d.items() for n2 in ns ]
print(tuples)
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2),
(1, 3), (1, 4), (1, 5), (1, 6)]