Home > Net >  How can I efficiently search for multiple adjacent elements in a std::set?
How can I efficiently search for multiple adjacent elements in a std::set?

Time:08-01

I have a set containing a sortable data type. I need to be able to search to find all elements that lie between two unknown points within this set. For the sake of simplicity I'll use some pseudocode to describe this.

set = {
  (a, 0),
  (a, 3),
  (a, 8),
  (b, 2),
  (b, 5),
  (c, 0)
}

In reality, my code is implemented as follows

// This is highly simplified
template<typename A, typename B>
class overall_type {

  public:
    auto find(A min_a, A max_a) {
      // Find all elements between (min_a, -oo) and (max_a, oo)
      // How can I do this?
    }

    // A large number of other implementation details

  private:
  
    struct contained_type {
      A data_a;
      B data_b;

      auto operator<(contained_type const& other) -> bool {
          if (data_a < other.data_a) {
            return true;
          }
          if (data_a > other.data_a) {
            return false;
          }
          return data_b < other.data_b;
        }
    }

    std::set<contained_type> internal_data;
}

I want to find all elements within the range (for example) (b, -oo) <= e <= (b, oo). As I have implemented comparators for my data type, the data will be placed in-order within the set, meaning that I just need to find a starting point and an ending point.

The problem I have is that I cannot be sure what the precise starting point is. In the example above, the lowest element is (b, 1), and therefore a simple find() operation won't suffice. Another constraint is that I don't know the minimum values for any of the types of elements that are contained within my data, because all the types are templates. I instead need to be able to search using incomplete data.

I have also considered implementing my own binary search, but as sets don't seem to be random accessible, I cannot easily do so whilst using one.

The only alternative I have considered is using a std::vector and implementing my own binary search over it, and adding code to insert to the correct position to keep it sorted, however the under-the-hood implmenetation of vectors means that I wouldn't meet the worst-case time complexity requirements of my project.

Apologies if I have overlooked a clear solution here - my university has taught the STL exceptionally poorly, so there may be a data structure designed for this that I simply haven't heard of and which my numerous searches didn't uncover.

How can I create a solution that matches the above criteria?

CodePudding user response:

Something along these lines, perhaps. This relies on the capability, added in C 14, of std::set::equal_range et al to perform heterogeneous searches, given a comparator that has overloads for combinations of different types.

template<typename A, typename B>
class overall_type {

  public:
    auto find(A min_a, A max_a) {
        return internal_data.equal_range(RangeCover{min_a, max_a});
    }    

  private:
    using contained_type = std::pair<A, B>;
  
    struct RangeCover {
        A& min_a;
        A& max_a;
    };

    struct Compare {
        using is_transparent = int;

        bool operator()(contained_type const& l, contained_type const& r) const {
            return l < r;
        }

        bool operator()(contained_type const& l, RangeCover const& r) const {
            return l.first < r.min_a;
        }

        bool operator()(RangeCover const& l, contained_type const& r) const {
            return l.max_a < r.first;
        }
    };

    std::set<contained_type, Compare> internal_data;
};

Demo

Compare makes an instance of RangeCover compare equivalent to all instances of contained_type that have the first component in the range, ignoring the second one.

CodePudding user response:

The following solution is not optimal in performance if you have large numbers of equal data_a values, but it is simple and does not require anything special for type B other than that it is default constructible. If you add transparent comparison to allow lower_bound() to work directly with A alone, it won't even require B to be default constructible.

auto find(A min_a, A max_a) {
  // Find all elements between (min_a, -oo) and (max_a, oo)
  contained_type min_pair{min_a}; // data_b is default constructed
  auto iter = internal_data.lower_bound(min_pair);

  // linear search backward for true begin point
  auto begin = iter;
  for (; begin != internal_data.begin(); --begin) {
    if (begin->data_a < min_a) {
        begin;
      break;
    }
  }

  // linear search forward for true end point
  // [left as an exercise for the reader]
}
  • Related