Home > OS >  create an edge list on films that share a genre
create an edge list on films that share a genre

Time:01-25

hello everyone I'm doing a project to analyze a website and build a network graph with python. I chose the themovieb.org website. The nodes are the ids of the movies and the links between the nodes are the genres that two movies depend on. For example node_A and node_B have a link if they have the same genres in common. I extracted the nodes and put them in an array: nodes. I have for example:

[
{'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1': '16', 'genre_ids_2': '28'},
{'id': 536554, 'label': 'M3GAN', 'genre_ids_1': '878', 'genre_ids_2': '27'},
{'id': 76600, 'label': 'Avatar: The Way of Water', 'genre_ids_1': '878', 'genre_ids_2': '12'},
{'id': 653851, 'label': 'Devotion', 'genre_ids_1': '10752', 'genre_ids_2': '878'},
{'id': 846433, 'label': 'The Enforcer', 'genre_ids_1': '28', 'genre_ids_2': '53'}
]

so I want to make a link for example between the movie "Puss in Boots: The Last Wish" and the movie "The Enforcer" which share the genre 28. I want as a result the edge list:

source      target               genre_ids
315162      846433               28
846433      315162               28
76600       536554               878
76600       653851               878
536554      76600                878
so on...

this is my code:

genres=[28,12,16,35,80,99,18,10751,14,36,27,10402,9648,10749,878,10770,53,10752,37]
edges=[]
nodes = [{'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1':'16','genre_ids_2': '28'},{'id': 536554, 'label': 'M3GAN','genre_ids_1':'878','genre_ids_2': '27'},{'id': 76600, 'label': 'Avatar: The Way of Water','genre_ids_1':'878', 'genre_ids_2': '12'},{'id': 653851, 'label': 'Devotion','genre_ids_1': '10752', 'genre_ids_2': '878'},{'id': 846433, 'label': 'The Enforcer','genre_ids_1': '28', 'genre_ids_2': '53'}]
dictionary={}
def get_edges():
    for i in nodes:
        if i["genre_ids_1"] in genres:
                dictionary.setdefault(i['genre_ids_1'], []).append(i['label'])
        elif i["genre_ids_2"] in genres:
                dictionary.setdefault(i['genre_ids_2'], []).append(i['label'])
        if i["genre_ids_1"] in dictionary:
                if i["label"]  not in dictionary[ i["genre_ids_1"]][0]:
                    edges.append({"source":i["label"],"target":i["id"],"genre_id":dictionary[ i["genre_ids_1"]][0] })
        elif i["genre_ids_2"] in dictionary:
                if i["label"]  not in dictionary[ i["genre_ids_2"]][1]:
                    edges.append({"source":i["label"],"target":i["id"],"genre_id":dictionary[ i["genre_ids_2"]][1] })
    print(edges)
get_edges()     

How can i do?

CodePudding user response:

First construct a dict nodes_by_genre that maps each genre id to the associated nodes (dicts). Then use itertools.permutations to generate the directed edges associated with each genre. Finally format each directed edge into a tuple for subsequent usage.

Note: If you want undirected edges, use itertools.combinations instead.

from pprint import pprint
from itertools import permutations

nodes = [
    {'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1': '16', 'genre_ids_2': '28'}, 
    {'id': 536554, 'label': 'M3GAN', 'genre_ids_1': '878', 'genre_ids_2': '27'}, 
    {'id': 76600, 'label': 'Avatar: The Way of Water', 'genre_ids_1': '878', 'genre_ids_2': '12'}, 
    {'id': 653851, 'label': 'Devotion', 'genre_ids_1': '10752', 'genre_ids_2': '878'}, 
    {'id': 846433, 'label': 'The Enforcer', 'genre_ids_1': '28', 'genre_ids_2': '53'},
]

def get_edges(nodes):
    nodes_by_genre = {}
    for node in nodes:
        nodes_by_genre.setdefault(node['genre_ids_1'], []).append(node)
        nodes_by_genre.setdefault(node['genre_ids_2'], []).append(node)
    edges = []
    for genre, nodes in nodes_by_genre.items():
        node_pairs = permutations(nodes, 2)
        new_edges = ((node1['label'], node2['label'], genre) for node1, node2 in node_pairs)
        edges.extend(new_edges)
    return edges
    
edges = get_edges(nodes)
pprint(edges)

Output:

[('Puss in Boots: The Last Wish', 'The Enforcer', '28'),
 ('The Enforcer', 'Puss in Boots: The Last Wish', '28'),
 ('M3GAN', 'Avatar: The Way of Water', '878'),
 ('M3GAN', 'Devotion', '878'),
 ('Avatar: The Way of Water', 'M3GAN', '878'),
 ('Avatar: The Way of Water', 'Devotion', '878'),
 ('Devotion', 'M3GAN', '878'),
 ('Devotion', 'Avatar: The Way of Water', '878')]
  • Related