I was trying to solve the "Hash Tables: Ransom Note" Hackerrank challenge in python. I come from a MATLAB background so I am used to working with arrays. The question is "Given the words in the magazine and the words in the ransom note, print Yes if he can replicate his ransom note exactly using whole words from the magazine; otherwise, print No."
I have two solutions to the problem; the first using arrays/lists which is what I would naturally write (and is what the input is given as!), and the second using a dictionary which feels unnatural to me. The issue is that the first approach times-out on a couple of the examples (so the second runs much quicker, and is the expected solution).
def checkMagazine(magazine, note):
# Using arrays
for word in note:
if word not in magazine:
print("No")
return
else:
magazine.remove(word)
print("Yes")
return
# Using dictionaries
dict = {}
for word in magazine:
dict[word] = dict.get(word,0) 1
for word in note:
if dict.get(word,0) == 0:
print('No')
return
else:
dict[word] -= 1
print('Yes')
return
Could someone give me pointers as to a) why the dictionary approach runs quicker and b) how to recognise problems where a dictionary would be the best solution.
CodePudding user response:
The key difference is that dictionaries are optimized for keyed access.
Locating a specific item in a list (e.g. doing word in magazine
or magazine.remove(word)
) is an O(n) operation that involves iterating over the entire list to look for the item.
Locating a specific item in a dictionary (e.g. dict.get(word)
or dict[word] -= 1
) is an O(1) operation.
Any time you have a collection of items where you will need to access a specific item by a particular identifier, a dictionary is the obvious choice. If you need to access items by position/order instead of identity (e.g. "pop the last item that was added"), a list is more appropriate.
CodePudding user response:
If magazine
is a list, every word not in magazine
takes O(n) time, because you have to walk through the entire list to see if word
is there or not. (And that bound is tight; when the word is indeed not in the list, you have to check every value to verify that.)
It also takes O(n) time to build your dictionary, but after that each lookup takes only O(1) time, as the value itself can be used to determine where in the dict it is, instead of having to compare it to every value in the dict
.
So use a dict
when you'll be performing enough lookups that the time you save with O(1) lookups makes up for the time spend building the dict
in the first place.