Home > Back-end >  Efficiently search a long list of lists
Efficiently search a long list of lists

Time:03-08

I have a long list of hexahedral point coordinates, for example:

[[0,    57,  2948,    56,   449, 14953, 15002,  5446],
[  449, 14953, 15002,  5446,   450, 14954, 15003,  5495],
[  450, 14954, 15003,  5495,   451, 14955, 15004,  5544],
[  451, 14955, 15004,  5544,   452, 14956, 15005,  5593],
[  452, 14956, 15005,  5593,   453, 14957, 15006,  5642],
....]

Each row defines a hexahedron cell, and by iterating over each cell, I extract the defining faces of the cell (6 faces), and add each face to a list processed_faces

enter image description here

All of this is fine, but because some cells are sharing the same face, I needed a way to check if the current face has been processed before or not (also needed later for connectivity calculations, so no way out of this).

 for cell_id, cell in enumerate(cells):
        faces = get_faces(cell)
        for face in faces:
            # have we met `face` before?
            if not self.face_exists(face):
                self.add_face(face)

            # link the face to the cell who shares it
            self.link_face_to_cell(face, cell_id)

The bottleneck of my code is face_exists(), for which I tried two different approaches:

  1. Sort each processed face, convert it to a tuple (to be hashable), add the tuple to a list of tuples, and simple check if a face exists by tuple(sorted(face)) in faces. But this is AWFULLY SLOW.

  2. Implement a trie data structure, which works just fine (and about 100x faster than method 1), but I am not satisfied with the performance

    class TrieNode:
        __slots__ = ['index', 'is_end', 'children']
        def __init__(self, index: int):
            self.index = index
            self.is_end = False
            self.children = {}
    
    
    class Trie:
        __slots__ = ['root']
        def __init__(self):
            self.root = TrieNode(-1)
    
        def insert(self, face: List[int]):
            node = self.root
    
            for point in face:
                if point in node.children:
                    node = node.children[point]
                else:
                    new_node = TrieNode(point)
                    node.children[point] = new_node
                    node = new_node
    
            node.is_end = True
    
    
        def search(self, face: List[int]) -> bool:
            node =self.root
    
            for point in face:
                if point in node.children:
                    node = node.children[point]
                    continue
                else:
                    return False
    
            if node.is_end:
                return True
    
            return False
    

Sorry for the long question, in summary I am looking for two things:

  1. An alternative data structure for mesh storage, that allows my fast search needs. OR:
  2. If trie is suitable for my application, is there a python library (hopefully with a C backend) that allows storing list of numbers, because libraries I came across are mostly designed for strings.

CodePudding user response:

One of possible drop-in solutions would be to use a set/frozenset for storing faces. Sets have great lookup performance and are simple to use here.

  • Related