Home > Software design >  Reduce string matching time in list of string
Reduce string matching time in list of string

Time:06-30

I have a list of string which has around 100k entry which might increase in future. In case of every input I have to process this list to find exact match.

usr_input = "find_word"
check_list = ["first_word", "second_word"] # around 100k entry

# What I am doing right now
if usr_input in check_list:
    print("Found word in list")

Now this works fine for smaller dataset. But as size increased to 100k I am seeing it taking a toll on my application. And response time changed to ~1min sometime when we've lot of entry to process.

Is there any way to optimize this operation.

CodePudding user response:

Is using a set instead of a list an option i.e. is it important if strings appear only once or multiple times? Since it uses hashing, the operation is a lot more efficient.

CodePudding user response:

Maybe you can convert to a dict, using dict comprehension, and find by key, something like this:

check_dict = {key:1 for key in check_list}

And then find by key:

if check_dict.get(usr_input, None):
    print("Found word in list")

Or you can use pandas.

  • Related