Home > Back-end >  Is std::unordered_map thread safe in this case?
Is std::unordered_map thread safe in this case?

Time:01-08

Assume that one thread calls only the following functions continuously.
here, insert_data check whether a key exists in std::unordered_map, and if it does not exist, insert_data call a function that adds a new key and modifies its values.

void insert_data(int key, int value, std::unordered_map<int, std::vector<int>>& my_map)
{
    if (my_map.find(key) == my_map.end())
    {
        my_map[key] = std::vector<int>();
    }

    my_map[key].push_back(value);
}

In another thread, it iterate over std::unordered_map over and over again.

void iteration(std::unordered_map<int, std::vector<int>>& my_map)
{
    for (auto& [key, value] : my_map)
    {
        std::cout<<"key : "<<key<<" value : "<<value<<std::endl;
    }
}

Is the shared my_map thread safe if each of the above functions is executed in only one thread?

CodePudding user response:

No, that is not safe. Operations that change the size of an STL container are never thread-safe.

Also your insert can be made more efficient:

void insert_data(int key, int value,
      std::unordered_map<int, std::vector<int>>& my_map)
{
    my_map[key].push_back(value);
}

The [key] operator creates the value automatically if it is not present. Even if your code wants to know whether it is a new entry, you can do better like this:

void insert_data(int key, int value,
      std::unordered_map<int, std::vector<int>>& my_map)
{
    auto inserted = my_map.emplace(key, std::vector<int>{});
    inserted.first->second.push_back(value);
    bool new_entry = inserted.second;
}

This avoids the duplicate lookup. It constructs a temporary vector of zero size but that is cheap.

Simple fix

The simplest solution is to protect the whole thing with a mutex.

class Dict
{
    std::mutex mutex;
    std::unordered_map<int, std::vector<int>> map;
public:
    void insert_data(int key, int value)
    {
        std::lock_guard<std::mutex> lock(mutex);
        map[key].push_back(value);
    }
    void iteration()
    {
        std::lock_guard<std::mutex> lock(mutex);
        for(const auto& key_values: map)
            for(int value: key_values.second)
                std::cout << "key : " << key_values.first
                          << " value : " << value << '\n';
    }
};

The main issue is that now the insertion can wait for an extended period until it can progress.

Buffered fix

To avoid these long latencies, we should decouple the two threads as much as possible. Something like this:

class Dict
{
    std::unordered_map<int, std::vector<int>> map;
    std::mutex deferred_mutex;
    std::unordered_map<int, std::vector<int>> deferred;
public:
    void insert_data(int key, int value)
    {
        std::lock_guard<std::mutex> lock(deferred_mutex);
        deferred[key].push_back(value);
    }
    void iteration()
    {
        std::unordered_map<int, std::vector<int>> new_elements;
        std::unique_lock<std::mutex> deferred_lock(deferred_mutex);
        deferred.swap(new_elements);
        deferred_lock.unlock();
        for(auto& [key, new_values]: new_elements) {
            std::vector<int>& values = map[key];
            values.insert(values.end(), new_values.begin(),
                          new_values.end());
        }
        for(const auto& key_values: map)
            for(int value: key_values.second)
                std::cout << "key : " << key_values.first
                          << " value : " << value << '\n';
    }
};

Basically we keep new elements separately until they can be inserted later. Compared to the cost of doing IO, the extra work for the iteration() thread should be negligible.

Instead of having a second unordered_map, a vector<pair<int, int>> for the key-value pairs would likely be more efficient but that requires benchmarking and knowledge of how often keys are duplicates.

  • Related