Home > Blockchain >  C Unordered Set string hash time complexity?
C Unordered Set string hash time complexity?

Time:05-21

Why the worst case complexity of insert into set is linear constant of size of container and not the size of element itself?

I am specifically talking about string. If I have a string set of size m, then if I insert a new string of size x, I assume that the insert operation will need to read string of size x in order to calculate key? So why dont we consider that time?

If there is another string that is of size 1000*x, then insertion will still take m size in worst case? Regardless of size of string, the time is 0(m)? How?

CodePudding user response:

This is a great question and it hits at some nuances in the way we analyze the cost of operations on hash tables.

There are actually several different ways to think about this. The first one is to think about the runtimes of the operations on a hash table measured with a runtime perspective that purely focuses on the size of the table. From that perspective, the cost of an insertion, lookup, or deletion does not scale as a function of the number of table elements. That is, the cost of looking something up does not depend on the number of elements in the table, the cost of inserting an element doesn't depend on the number of elements in the table, etc. From that vantage point, if we let n denote the number of elements in the table, then the cost of an insertion, deletion, or lookup is O(1), because there is no dependency on n.

In this sense, the big-O notation here is meant to be interpreted as "if the only variable is n, the number of elements in the table, how will things scale?" But that leaves a lot to be desired, since it completely ignores the cost of comparing strings for equality, the cost of evaluating the hash function, etc.

If you factor those details in, then yes, you're correct - the cost of looking up, inserting, or deleting a string of length m from a hash table with n elements is O(m), not O(1).

The perspective I've always found most helpful is the following. When a hash table says that all operations run in time O(1), what it really means is that each operation requires only O(1) total hash evaluations and comparisons. In that sense, it means "the cost of looking something up, or inserting something, or deleting something is some constant number of hash evaluations and comparisons." To then work out the total cost, you then multiply O(1) by the cost of a hash or comparison, which in the case of strings of length m would be O(m). That gives you the overall runtime of O(m), which matches your intuition.

CodePudding user response:

The standard gives the complexity as a function of the number of elements in the container in terms of primitive operations on the element type. These primitive operations are supplied by the instantiation.

All of the complexity requirements in this Clause are stated solely in terms of the number of operations on the contained objects. [Example: The copy constructor of type vector<vector<int>> has linear complexity, even though the complexity of copying each contained vector<int> is itself linear. — end example]

For example here's how the complexity of operator== on unordered associative containers is described:

For unordered_set and unordered_map, the complexity of operator== (i.e., the number of calls to the == operator of the value_type, to the predicate returned by key_eq(), and to the hasher returned by hash_function()) is proportional to N in the average case and to N2 in the worst case, where N is a.size().

If you require complexity in terms of other operations, such as small integer arithmetic operations, you can derive it yourself if you know the complexity of each operation on the element type in those terms.

  • Related