Is it possible to overload/modify 'find' of std::unordered_set to use a key that is a subs-CodePudding

I see little use of std::unordered_set or any hash table for that matter if the objects themselves are the keys: in order to search anything, you'd need to have the element in the first place. But if I already have the element, I don't need to search it (the exception being if I only need to check if it's a duplicate).

Much more interesting is the so called "heterogeneous lookup", the case where your data is an object containing multiple pieces of data, and treating one of those as key. For example I want to get all the employee data of one employee named "Pete Johnson" by doing auto it = employees.find("Pete Johnson").

I read about transparent comparators and tried creating my own hash function and 'equal_to' operator to use a subset of the data. Creating the set and adding objects works, but I still cannot get the 'find' function to work the way I want.

The easy solution for what I want is to just use std::unordered_map, but I hoped there would be a better way than to keep two copies of each key. Another workaround is to create a dummy object, copy the key into the dummy, then use that to do the lookup. However, I think the ideal solution would be if I had the option to provide a 'getKey' function to the constructor, so that 'find' can accept a naked key and use 'getKey' to compare it to the object to see if it matches.

std::string getKey(const userData_t& object){
  return object.name;
}

std::size_t userData_t_hash(const userData_t& object){
  return std::hash<std::string>()(getKey(object));
}

bool userData_t_equals_to(const userData_t& object, const std::string& str){
  return getKey(object) == str;
}

bool userData_t_equals_to(const userData_t& object, const userData_t& other){
  return getKey(object) == getKey(other);
}

CodePudding user response：

Why would I need to search and obtain (a reference to) the object from the hash table if I already had the object?

One reason is to find out if the object is in the hash table. Sometimes sets are used to eliminate duplicates from a collection. If sorting is not needed, then unordered_set might be the best choice for the task.

It would make more sense if I could provide a 'getKey' function [etc.]

You can in principle. The syntax is not exactly what you had in mind, but the functionality is available.

The hash of an object does not need to take into account all data in the object. The requirement is that if two objects should be considered the same, then they must have the same hash. If you define the hash of your object to be the hash of object.getKey() and define equality to be having the same key, then you could find a real object by constructing a dummy object with just the key set. If your objects represent employees and the key is the name (not the best choice, but sufficient for now), then you could construct an employee object with no data set other than the name, and use that to find the real employee object.

Admittedly, constructing a dummy object to set one field is a bit of a hassle. Hence, C 14 introduced find overloads that accept data that compares equivalent to elements of your set. This requires that the hash and compare types have a member type named is_transparent. For example, this member type exists in std::equal_to<void> (which is not the same as the default comparison, std::equal_to<Key>). With the appropriate hash and comparison definitions, you could use employees.find("Pete") to find the employee object whose name is "Pete".

(I'm going to leave further details as "out of scope" for the question that was asked. Some general information can be found in What are transparent comparators?.)

So one use case for unordered_set is a replacement for unordered_map when the values would necessarily contain the key and the values do not need to change.