Home > Net >  Is there a more efficient method than nested loops to search through a list and then find correspond
Is there a more efficient method than nested loops to search through a list and then find correspond

Time:06-30

I have a list of values and I need to search through an ~1 GB text file for those values and then pull a corresponding value from the text file in the same row. Is there a more efficient method than using nested for loops where I would be going through my list of values and then going into my nested loop where I would be iterating through the text file to find the corresponding matches?

    for name in name_list:
        with open("test.txt") as infile:
            for line in infile:
                currentline = line.split("|")
                if name == currentline[0]:
                    print(currentline[2])

CodePudding user response:

This is exactly what a dictionary is for. Assuming parts[0] is unique:

dic = {}
with open("test.txt") as infile:
    for line in infile:
        parts = line.split("|")
        dic[parts[0]] = parts[2]
for name in name_list:
    if name in dic:
        print(dic[name])

if it isn't unique, you'll have to keep a list for each dict entry:

dic = {}
with open("test.txt") as infile:
    for line in infile:
        parts = line.split("|")
        if parts[0] in dic:
            dic[parts[0]].append(parts[2])
        else:
            dic[parts[0]] = [parts[2]]
for name in name_list:
    if name in dic:
        for matching in dic[name]:
            print(matching)

this drastically reduces the runtime: assuming your file has n entries and your name_list has m entries, you had a complexity of O(n * m) before - now you have O(n m), since hash map access is constant time - you don't perform a linear search anymore!

In practice, the speedup will be even larger since your previous code reread the file within the O(m) loop.

Furthermore, as suggested by @KellyBundy, you can use a set with the name_list to only collect values for names you're interested in:

dic = {name: [] for name in name_list}
with open("test.txt") as infile:
    for line in infile:
        parts = line.split("|")
        if parts[0] in dic:
            dic[parts[0]].append(parts[2])
for matching_list in dic.values():
    for matching in matching_list:
        print(matching)

CodePudding user response:

The the loop through the name_list is faster than that one in the files. So you can invert loops and using set to check the value:

names_to_search_for = set(name_list)
with open("test.txt") as infile:
   for line in infile:
      currentline = line.split("|")
         if currentline[0] in names_to_search_for:
             print(currentline[0], currentline[2])

This should reduce the access to the file and the speed.

CodePudding user response:

You can put name in a set, where lookups are quick. Then as you scan the lines, you can check if your line part is in names:

names = set(name_list)

with open("test.txt") as infile:
    for line in infile:
        split_line = line.split("|")
        if split_line[0] in names:
            print(split_line[2])
  • Related