Home > OS >  Iterating through large files without for loops in Python
Iterating through large files without for loops in Python

Time:10-09

I have two large files with 16000 entries that I want to iterate through, compare four variables from them and perform some calculations when there is a match. These files represent the same set of models but contain somewhat different output, thus all of the models from file 1 have a match in file 2.

File 1 and file 2 are tables where each column has a header. For instance, file 1 is

# a1 b1 c1 d1 age
1 5 33 22.1 1e20 10
2 2 56 85.6 2e30 1
... ... ... ... ... ...

And file 2 is

# a2 b2 c2 d2 length
1 9 98 34.8 3e15 40
2 12 22 10.2 5e10 20
... ... ... ... ... ...

Essentially, a1, b1, c1, d1 and a2, b2, c2, d2 represent the same values/models but in a different order. I want to match them and create a new table that will look like this:

# a b c d length age
... ... ... ... ... ... ...

Intuitively, I'd create two for loops of this type:

for i in range(len(file1)):
       for j in range(len(file2)):  
           if a1[i] == a2[j] and b1[i]==b2[j] and c1[i]==c2[j] and d1[i]==d2[j]:
              #some calculations on age and length 

I wonder if there is a more robust way that would avoid having a nested for loop.

UPD: I forgot to mention that I need to match the a, b, c, d terms because they describe the model parameters.

CodePudding user response:

You can use itertools.product, to merge those nested loops into one loop. It will still be a loop, but a bit nicer.

for (i, j) in itertools.product(range(len(file1)), range(len(file2))):  
    if a1[i] == a2[j] and b1[i]==b2[j] and c1[i]==c2[j] and d1[i]==d2[j]:
        #some calculations stored and returned 

CodePudding user response:

if your goal is to reduce nested loop but you are ok with still iterating over each file, would this work for your case?

s1 = set()
for i in range(len(file1)):
    s1.add([ a1[i], b1[i], c1[i], d1[i] ])
for j in range(len(file2)):
    if [ a2[j], b2[j], c2[j], d2[j] ] in s1:
        perform something
  • Related