I want to build a graph that is a grid of size 2N 1 times 2N 1, with each edge being present with probability $p$.
Here's my code.
import networkx as nx
import matplotlib.pyplot as plt
from tqdm import tqdm
import numpy as np
def getGrid(N):
G=nx.grid_2d_graph(2*N 1,2*N 1)
return G
def getGridBond(N,p):
G = getGrid(N)
for elt in list(G.edges):
pt_1 = elt[0]
pt_2 = elt[1]
if np.random.binomial(1,p) == 0:
G.remove_edge(pt_1,pt_2)
return G
for i in tqdm(range(0,100)):
getGridBond(100,0.5)
This code takes a long time, with each iteration taking about 1.7s.
How can I make it faster?
Thank you.
CodePudding user response:
There are a few things going on here with the randomized steps. Here's your code:
def getGridBond(N,p):
G = getGrid(N)
for elt in list(G.edges):
pt_1 = elt[0]
pt_2 = elt[1]
if np.random.binomial(1,p) == 0:
G.remove_edge(pt_1,pt_2)
return G
First, you don't need to define pt_1
and pt_2
and then remove the edge. Instead, you could remove the edge with G.remove_edge(*elt)
. However, we can do better. You should work under the assumption that each random number generation is expensive. When you're working with random numbers where sometimes something happens and sometimes something doesn't, there are often ways to turn the calculation into one or two random number generations which you can use to find all the "true" cases.
You could slightly speed up your if np.random.binomial(1,p)==0
call to if np.random.random()<p
(assuming my understanding of p
is correct).
However, we can do better: instead of doing this, np.random.binomial(len(G.edges),
1-p)gives a number from the correct distribution for how many edges to delete. Then choose this number of edges, with
np.random.choice(list(G.edges),...)` (without replacement).
So
def getGridBond(N,p):
G = getGrid(N)
edgelist = list(G.edges)
delete_count = np.random.binomial(len(edges), 1-p)
delete_edges = np.random.choice(edgelist, size = delete_count, replace=False)
G.remove_edges_from(delete_edges)
return G
should be faster.
CodePudding user response:
I took Joel answer, resolved errors, added caching and some other improvements. It turned out to be ~20x faster (disregarding first run, before cache makes a difference).
import networkx as nx
from tqdm import tqdm
import numpy as np
from functools import lru_cache
@lru_cache(maxsize=None)
def getGrid2(N):
G=nx.grid_2d_graph(2*N 1,2*N 1)
return G
def getGridBond2(N, p):
G = getGrid2(N)
edgelist = [edge for edge in G.edges()]
n_endges = len(edgelist)
delete_count = np.random.binomial(n_endges, 1 - p)
to_remove_indices = np.random.choice(np.arange(n_endges), size=delete_count, replace=False)
delete_edges = [edgelist[idx] for idx in to_remove_indices]
G.remove_edges_from(delete_edges)
return G
my current profiling results
Line # Hits Time Per Hit % Time Line Contents
==============================================================
40 @do_profile(follow=[getGridBond, getGridBond2])
41 def main():
42 1 3188659.0 3188659.0 8.3 getGridBond(100,0.5)
43 1 2046530.0 2046530.0 5.3 getGridBond2(100,0.5)
44 11 23853.0 2168.5 0.1 for i in tqdm(range(0,10)):
45 10 31481416.0 3148141.6 82.3 getGridBond(100,0.5)
46 10 1515063.0 151506.3 4.0 getGridBond2(100,0.5)