I want to create a NetworkX graph from a Pandas adjacency matrix.
Usually this works with nx.from_pandas_adjacency(df)
. But this time I have an affiliation network.
here is an example:
import pandas as pd
import numpy as np
import networkx as nx
#create dummy df
rng = np.random.RandomState(seed=5)
rng = np.random.RandomState(seed=5)
ints = rng.randint(1, 11, size=(4, 2))
df = pd.DataFrame(ints, columns=["Book1","Book2"])
df["Tag"]=['music','city','transport','traveling']
df.set_index('Tag', inplace=True)
#show dummy df
print (df)
This gives the dummy df:
Book1 Book2
Tag
music 4 7
city 7 1
transport 10 9
traveling 5 8
If I now try to use nx.from_pandas_adjacency(df)
the following error message comes:
networkx.exception.NetworkXError: ('Columns must match Indices.',
"['music', 'city', 'transport', 'traveling'] not in columns")
I could do a loop and put the information into an edge list and then pass it to Networkx like this:
edgelist=[
('Book1','music',4),
('Book2','music',7),
('Book1','city',7),
('Book2','city',1)... and so on]
But I am sure there is a much better and more efficient way to do this. The real df
has 1000 Books and 90% NaN
values (no edge).
CodePudding user response:
You can melt the dataframe and then use nx.from_pandas_edgelist
:
G = nx.from_pandas_edgelist(
df.reset_index().melt(id_vars='Tag'),
source='Tag',
target='variable',
edge_attr='value'
)