I have a list of string which has around 100k entry which might increase in future. In case of every input I have to process this list to find exact match.
usr_input = "find_word"
check_list = ["first_word", "second_word"] # around 100k entry
# What I am doing right now
if usr_input in check_list:
print("Found word in list")
Now this works fine for smaller dataset. But as size increased to 100k I am seeing it taking a toll on my application. And response time changed to ~1min sometime when we've lot of entry to process.
Is there any way to optimize this operation.
CodePudding user response:
Is using a set
instead of a list
an option i.e. is it important if strings appear only once or multiple times? Since it uses hashing, the operation is a lot more efficient.
CodePudding user response:
Maybe you can convert to a dict, using dict comprehension, and find by key, something like this:
check_dict = {key:1 for key in check_list}
And then find by key:
if check_dict.get(usr_input, None):
print("Found word in list")
Or you can use pandas.