I have a dataframe that contains movies, actors names etc. And it has 41k rows.
I'm planning to make a graph from NX library and I want to use actors as nodes, and make edges if they are played in a same movie. I tried to make it dataframe and do it with for loops but I couldn't. Can you help me?
Edit: I want to make a graph like this:
CodePudding user response:
IIUC, let's try something like this using networkx and itertools libraries:
from itertools import tee
import networkx as nx
import pandas as pd
df = pd.DataFrame({'Movie': [*'AAABBCCCDD'],
'Actor':[1,2,3,2,5,7,8,9,10,8]})
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
G = nx.Graph()
for _, s in df.groupby('Movie'):
if s.shape[0] > 1:
[G.add_edge(*i) for i in pairwise(s['Actor'])]
else:
G.add_node(s['Actor'].iloc[0])
nx.draw_networkx(G)
[list(i) for i in nx.connected_components(G)]
Output:
And, actor groups:
[[1, 2, 3, 5], [8, 9, 10, 7]]