Shouldnt the below algorithm be O(n^2) worst case? because we are iterating over the n elements in the array first, then in the worst case the array happens to only contain distinct elements, so when we check in the Set s if an element is present, we have to iterate over n elements again before adding that element, so O(n^2)?
# This function prints all distinct elements
def countDistinct(arr, n):
# Creates an empty hashset
s = set()
# Traverse the input array
res = 0
for i in range(n):
# If not present, then put it in
# hashtable and increment result
if (arr[i] not in s):
s.add(arr[i])
res = 1
return res
# Driver code
arr = [6, 10, 5, 4, 9, 120, 4, 6, 10]
n = len(arr)
print(countDistinct(arr, n))
CodePudding user response:
You are correct, the worst-case time complexity of set lookup is O(n), and you have correctly described the scenario in which this would occur. If every element in the set was the same, there would not be any hash collisions, and this would guarantee O(1) set membership check and only one insertion for an overall O(n) runtime. If every element was distinct and collided with every key in the set during each membership check/insertion, that would result in O(n) add/lookup which would be O(n^2) complexity.
CodePudding user response:
Your conclusion is correct in my opinion, but the reasoning is not completely correct:
we have to iterate over n elements again before adding that element
This is not necessary. In CPython, set is implemented as a hash table rather than a linked list (or other structures that must require O(n)
time to append elements). Only when severe hash collisions occur (for example, all values are different but their hash values are the same), the time complexity of adding an element can reach O(n)
.