hello everyone I'm doing a project to analyze a website and build a network graph with python. I chose the themovieb.org website. The nodes are the ids of the movies and the links between the nodes are the genres that two movies depend on. For example node_A and node_B have a link if they have the same genres in common. I extracted the nodes and put them in an array: nodes. I have for example:
[
{'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1': '16', 'genre_ids_2': '28'},
{'id': 536554, 'label': 'M3GAN', 'genre_ids_1': '878', 'genre_ids_2': '27'},
{'id': 76600, 'label': 'Avatar: The Way of Water', 'genre_ids_1': '878', 'genre_ids_2': '12'},
{'id': 653851, 'label': 'Devotion', 'genre_ids_1': '10752', 'genre_ids_2': '878'},
{'id': 846433, 'label': 'The Enforcer', 'genre_ids_1': '28', 'genre_ids_2': '53'}
]
so I want to make a link for example between the movie "Puss in Boots: The Last Wish" and the movie "The Enforcer" which share the genre 28. I want as a result the edge list:
source target genre_ids
315162 846433 28
846433 315162 28
76600 536554 878
76600 653851 878
536554 76600 878
so on...
this is my code:
genres=[28,12,16,35,80,99,18,10751,14,36,27,10402,9648,10749,878,10770,53,10752,37]
edges=[]
nodes = [{'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1':'16','genre_ids_2': '28'},{'id': 536554, 'label': 'M3GAN','genre_ids_1':'878','genre_ids_2': '27'},{'id': 76600, 'label': 'Avatar: The Way of Water','genre_ids_1':'878', 'genre_ids_2': '12'},{'id': 653851, 'label': 'Devotion','genre_ids_1': '10752', 'genre_ids_2': '878'},{'id': 846433, 'label': 'The Enforcer','genre_ids_1': '28', 'genre_ids_2': '53'}]
dictionary={}
def get_edges():
for i in nodes:
if i["genre_ids_1"] in genres:
dictionary.setdefault(i['genre_ids_1'], []).append(i['label'])
elif i["genre_ids_2"] in genres:
dictionary.setdefault(i['genre_ids_2'], []).append(i['label'])
if i["genre_ids_1"] in dictionary:
if i["label"] not in dictionary[ i["genre_ids_1"]][0]:
edges.append({"source":i["label"],"target":i["id"],"genre_id":dictionary[ i["genre_ids_1"]][0] })
elif i["genre_ids_2"] in dictionary:
if i["label"] not in dictionary[ i["genre_ids_2"]][1]:
edges.append({"source":i["label"],"target":i["id"],"genre_id":dictionary[ i["genre_ids_2"]][1] })
print(edges)
get_edges()
How can i do?
CodePudding user response:
First construct a dict nodes_by_genre
that maps each genre id to the associated nodes (dicts). Then use itertools.permutations
to generate the directed edges associated with each genre. Finally format each directed edge into a tuple for subsequent usage.
Note: If you want undirected edges, use itertools.combinations
instead.
from pprint import pprint
from itertools import permutations
nodes = [
{'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1': '16', 'genre_ids_2': '28'},
{'id': 536554, 'label': 'M3GAN', 'genre_ids_1': '878', 'genre_ids_2': '27'},
{'id': 76600, 'label': 'Avatar: The Way of Water', 'genre_ids_1': '878', 'genre_ids_2': '12'},
{'id': 653851, 'label': 'Devotion', 'genre_ids_1': '10752', 'genre_ids_2': '878'},
{'id': 846433, 'label': 'The Enforcer', 'genre_ids_1': '28', 'genre_ids_2': '53'},
]
def get_edges(nodes):
nodes_by_genre = {}
for node in nodes:
nodes_by_genre.setdefault(node['genre_ids_1'], []).append(node)
nodes_by_genre.setdefault(node['genre_ids_2'], []).append(node)
edges = []
for genre, nodes in nodes_by_genre.items():
node_pairs = permutations(nodes, 2)
new_edges = ((node1['label'], node2['label'], genre) for node1, node2 in node_pairs)
edges.extend(new_edges)
return edges
edges = get_edges(nodes)
pprint(edges)
Output:
[('Puss in Boots: The Last Wish', 'The Enforcer', '28'),
('The Enforcer', 'Puss in Boots: The Last Wish', '28'),
('M3GAN', 'Avatar: The Way of Water', '878'),
('M3GAN', 'Devotion', '878'),
('Avatar: The Way of Water', 'M3GAN', '878'),
('Avatar: The Way of Water', 'Devotion', '878'),
('Devotion', 'M3GAN', '878'),
('Devotion', 'Avatar: The Way of Water', '878')]