Home > Net >  Counting triangles in a graph by iteratively removing high-degree nodes
Counting triangles in a graph by iteratively removing high-degree nodes

Time:01-03

Computing nx.triangles(G) on an undirected graph with about 150 thousand nodes and 2 million edges, is currently very slow (on the scale of 80 hours). If the node degree distribution is highly skewed, is there any problem with counting triangles using the following procedure?

import networkx as nx

def largest_degree_node(G):
    # this was improved using suggestion by Stef in the comments
    return max(G.degree(), key=lambda x: x[1])[0]

def count_triangles(G):
    G=G.copy()
    triangle_counts = 0
    while len(G.nodes()):
        focal_node = largest_degree_node(G)
        triangle_counts  = nx.triangles(G, nodes=[focal_node])[focal_node]
        G.remove_node(focal_node)
    return triangle_counts

G = nx.erdos_renyi_graph(1000, 0.1)

# compute triangles with nx
triangles_nx = int(sum(v for k, v in nx.triangles(G).items()) / 3)

# compute triangles iteratively
triangles_iterative = count_triangles(G)

# assertion passes
assert int(triangles_nx) == int(triangles_iterative)

The assertion passes, but I am wary that there are some edge cases where this iterative approach will not work.

CodePudding user response:

Assuming the graph is not directed (ie. G.is_directed() == False), the number of triangles can be efficiently found by finding nodes that are both neighbors of neighbors and direct neighbors of a same node. Pre-computing and pre-filtering the neighbors of nodes so that each triangle is counted only once helps to improve a lot the execution time. Here is the code:

nodeNeighbours = {
    # The filtering of the set ensure each triangle is only computed once
    node: set(n for n in edgeInfos.keys() if n > node)
    for node, edgeInfos in G.adjacency()
}

triangleCount = sum(
    len(neighbours & nodeNeighbours[node2])
    for node1, neighbours in nodeNeighbours.items()
    for node2 in neighbours
)

The above code is about 12 times faster than the original iterative solution on the example graph. And up to 72 times faster on nx.erdos_renyi_graph(15000, 0.005).

  • Related