Most efficient paradigm for checking if a key exists in a c std::unordered

I am relatively new to modern c and working with a foreign code base. There is a function that takes a std::unordered_map and checks to see if a key is present in the map. The code is roughly as follows

uint32_t getId(std::unordered_map<uint32_t, uint32_t> &myMap, uint32_t id) 
{
   if(myMap.contains(id))
   {
     return myMap.at(id);
   }
   else
   {
     std::cerr << "\n\n\nOut of Range error for map: "<< id << "\t not found" << std::flush;
     exit(74);
   }
}

It seems like calling contains() followed by at() is inefficient since it requires a double lookup. So, my question is, what is the most efficient way to accomplish this? I also have a followup question: assuming the map is fairly large (~60k elements) and this method gets called frequently how problematic is the above approach?

After some searching, it seems like the following paradigms are more efficient than the above, but I am not sure which would be best.

Calling myMap.at() inside of a try-catch construct
- Pros: at automatically throws an error if the key does not exist
- Cons: try-catch is apparently fairly costly and also constrains what the optimizer can do with the code
Use find
- Pros: One call, no try-catch overhead
- Cons: Involves using an iterator; more overhead than just returning the value

    auto findit = myMap.find(id);
    if(findit == myMap.end())
    {
      //error message;
      exit(74); 
    }
    else
    {
      return findit->first;
    }

CodePudding user response：

You can do

// stuff before
{
    auto findit = myMap.find(id);
    if ( findit != myMap.end() ) {
        return findit->first;
    } else {
       exit(74);
    }
}
// stuff after

or with the new C 17 init statement syntax

// stuff before
if ( auto findit = myMap.find(id); findit != myMap.end() ) {
    return findit->first;
} else {
   exit(74);
}
// stuff after

Both define the iterator reference only in local scope. As the interator use is most definitively optimized away, I would go with it. Doing a second hash calculation will be slower almost for sure.

Also note that findit->first returns the key not the value. I was not sure what you expect the code to do, but one of the code snippets in the question returns the value, while the other one returns the key

CodePudding user response：

In case you don't get enough speedup within only removing the extra lookup operation and if there are millions of calls to getId, then you can use an N-way map to be able to parallelize the id-checks:

template<int N>
class NwayMap
{
    public:
    NwayMap(uint32_t hintMaxSize = 60000)
    { 
         // hint about max size to optimize initial allocations
         for(int i=0;i<N;i  )
             shard[i].reserve(hintMaxSize/N);
    }
    
    void addIdValuePairThreadSafe(const uint32_t id, const uint32_t val)
    {
        // select shard
        const uint32_t selected = id%N; // can do id&(N-1) for power-of-2 N value

        std::lock_guard<std::mutex> lg(mut[selected]);

        auto it = shard[selected].find(id);
        if(it==shard[selected].end())
        {
            shard[selected].emplace(id,val);
        }
        else
        {
            // already added, update?
        }
    }

    uint32_t getIdMultiThreadSafe(const uint32_t id)
    {
        // select shard
        const uint32_t selected = id%N; // can do id&(N-1) for power-of-2 N value

        // lock only the selected shard, others can work in parallel
        std::lock_guard<std::mutex> lg(mut[selected]);

        auto it = shard[selected].find(id);
        // we expect it to be found, so get it quicker
        // without going "else"
        if(it!=shard[selected].end())
        {
              return it->second;
        }
        else
        {
              exit(74);
        }
    }

    private:
    std::unordered_map<uint32_t, uint32_t> shard[N];
    std::mutex mut[N];
};

Pros:

if you serve each shard's getId from their own CPU threads, then you benefit from N*L1 cache size.
even within single thread use case, you can still interleave multiple id-check operations and benefit from instruction-level-parallelism because checking id 0 would have different independent code path than checking id 1 and CPU could do out-of-order execution on them (if pipeline is long enough)

Cons:

if a lot of checks from different threads collide, their operations are serialized and the locking mechanism causes extra latency
when id values are mostly strided, the parallelization is not efficient due to unbalanced emplacement