I am trying to make a graph for this data using the following code:
import networkx as nx
import csv
import matplotlib.pyplot as plt
graph = nx.Graph()
filename = "tubedata.csv"
with open(filename) as tube_data:
starting_station = [row[0] for row in csv.reader(tube_data, delimiter=',')]
with open(filename) as tube_data:
ending_station = [row[1] for row in csv.reader(tube_data, delimiter=',')]
with open(filename) as tube_data:
average_time_taken = [row[3] for row in csv.reader(tube_data, delimiter=',')]
with open(filename) as tube_data:
for line in tube_data:
graph.add_edge(starting_station, ending_station, weight=average_time_taken)
However, I keep getting the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_101/53822893.py in <module>
17 with open(filename) as tube_data:
18 for line in tube_data:
---> 19 graph.add_edge(starting_station, ending_station, weight=average_time_taken)
/opt/conda/lib/python3.9/site-packages/networkx/classes/graph.py in add_edge(self, u_of_edge, v_of_edge, **attr)
870 u, v = u_of_edge, v_of_edge
871 # add nodes
--> 872 if u not in self._node:
873 self._adj[u] = self.adjlist_inner_dict_factory()
874 self._node[u] = self.node_attr_dict_factory()
TypeError: unhashable type: 'list'
I have searched the error and understand that I need to pass a data structure that is immutable. I changed the code to the following:
with open(filename) as tube_data:
starting_station = (row[0] for row in csv.reader(tube_data, delimiter=','))
with open(filename) as tube_data:
ending_station = (row[1] for row in csv.reader(tube_data, delimiter=','))
with open(filename) as tube_data:
average_time_taken = (row[3] for row in csv.reader(tube_data, delimiter=','))
with open(filename) as tube_data:
for line in tube_data:
graph.add_edge(starting_station, ending_station, weight=average_time_taken)
This resolves the above error but produces a graph with only two nodes and 1 edge? How can I capture the full data as a graph?
CodePudding user response:
I would create the graph using the following steps:
- Use the
pandas
library to read in the data into a DataFrame object - Create an edge list [(source, target, weight)] from the data frame rows
- Create an empty directed graph in networkX
- Add edges to the DiGraph object by passing in the edge list
import networkx as nx
import pandas as pd
data = pd.read_csv('tubedata.csv',header=None)
edgelist = data.apply(lambda x: (x[0],x[1],x[3]),axis=1).to_list()
# edgelist
# [('Harrow & Wealdstone', 'Kenton', 3),
# ('Kenton', 'South Kenton', 2),
# ('South Kenton', 'North Wembley', 2),
# ('North Wembley', 'Wembley Central', 2),...
G = nx.DiGraph()
G.add_weighted_edges_from(edgelist)
list(G.edges(data=True))[:5]
# >>>[('Harrow & Wealdstone', 'Kenton', {'weight': 3}),
# ('Kenton', 'South Kenton', {'weight': 2}),
# ('South Kenton', 'North Wembley', {'weight': 2}),
# ('North Wembley', 'Wembley Central', {'weight': 2}),
# ('Wembley Central', 'Stonebridge Park', {'weight': 3})]
You can also get the same result going straight for from_pandas_edgelist
see documentation, after renaming the pandas data frame columns:
data = data.rename(columns={0:'source',1:'target',3:'average_time_taken'})
G2 = nx.convert_matrix.from_pandas_edgelist(data, source='source', target='target', edge_attr='average_time_taken', create_using=nx.DiGraph)
list(G2.edges(data=True))[:5]
# [('Harrow & Wealdstone', 'Kenton', {'average_time_taken': 3}),
# ('Kenton', 'South Kenton', {'average_time_taken': 2}),
# ('South Kenton', 'North Wembley', {'average_time_taken': 2}),
# ('North Wembley', 'Wembley Central', {'average_time_taken': 2}),
# ('Wembley Central', 'Stonebridge Park', {'average_time_taken': 3})]