Home > Mobile >  Python Nested For Loop with Two Dictionaries, Inner Loop Not Resetting
Python Nested For Loop with Two Dictionaries, Inner Loop Not Resetting

Time:09-17

I am trying to compare specific values between two csv files. I read in both csv files using the csv.DictReader() function and I have a nested for loop with each going through one of the readers. Of course, normally the inner for loop will reset and go through the entirety of its loop for each iteration of the outer loop, but this is not the case for me. When using my debugger, I can see on the second iteration of the outer loop, the code skips past the inner loop entirely as if there isn't anything to loop over. Is this due to a property of looping through the dictionary reader object? If so, how can I fix it? I have included a snippet of my code below.

with open('csv1.csv', 'r') as inFile1:
   with open('csv2.csv', 'r') as inFile2:
      reader1 = csv.DictReader(inFile1)
      reader2 = csv.DictReader(inFile2)

      for row1 in reader1:
         for row2 in reader2:
            if row1['key1'] == row2['key2']:
               [Perform other operations here]

CodePudding user response:

Once you have exhausted an iterator it doesn't automatically reset.

Instead, you must provide a new inner iterator for each outer iteration.

with open('csv1.csv', 'r') as inFile1:
    reader1 = csv.DictReader(inFile1)

    for row1 in reader1:
        with open('csv2.csv', 'r') as inFile2:
            reader2 = csv.DictReader(inFile2)
            for row2 in reader2:
                if row1['key1'] == row2['key2']:
                    [Perform other operations here]

Or, if the file sizes are reasonable, simply read the files into memory before you process them:

with open('csv1.csv', 'r') as inFile1, open('csv2.csv', 'r') as inFile2:
    csv1 = list(csv.DictReader(inFile1))
    csv2 = list(csv.DictReader(inFile2))
    
for dict1 in csv1:
    for dict2 in csv2:
        if dict1['key1'] == dict2['key2']:
            [Perform other operations here]                        

CodePudding user response:

@djones's answer works but is highly inefficient as it requires O(n x m) in time complexity, where n and m are the number of rows of the two files.

The problem can be solved in linear time if you build a dict from the first file with the value of key1 as the key, and the iterate rows through the second file to find matches of key2 in the dict. Since dict lookups cost O(1) in average time complexity, the overall time complexity would become O(n) instead:

with open('csv1.csv', 'r') as inFile1:
    rows1 = {row1['key1']: row1 for row1 in csv.DictReader(inFile1)}
with open('csv2.csv', 'r') as inFile2:
    for row2 in csv.DictReader(inFile2):
        key = row2['key2']
        if key in rows1:
            print(rows1[key], row2)

If the keys of the two tables in the two CSV files are of a many-to-many relationship, however, you can read the first file into a dict of lists instead, so that you can still look up the keys from the second file in constant time and complete the overall process in a linear time complexity of O(n m k), where n and m are the number of records in the two files, and k is the number of matches:

rows1 = {}
with open('csv1.csv', 'r') as inFile1:
    for row1 in csv.DictReader(inFile1):
        rows1.setdefault(row1['key1'], []).append(row1)
with open('csv2.csv', 'r') as inFile2:
    for row2 in csv.DictReader(inFile2):
        for row1 in rows1.get(row2['key2'], ()):
            print(row1, row2)

Demo here

  • Related