Iterating through large files without for loops in Python-CodePudding

I have two large files with 16000 entries that I want to iterate through, compare four variables from them and perform some calculations when there is a match. These files represent the same set of models but contain somewhat different output, thus all of the models from file 1 have a match in file 2.

File 1 and file 2 are tables where each column has a header. For instance, file 1 is

#	a1	b1	c1	d1	age
1	5	33	22.1	1e20	10
2	2	56	85.6	2e30	1
...	...	...	...	...	...

And file 2 is

#	a2	b2	c2	d2	length
1	9	98	34.8	3e15	40
2	12	22	10.2	5e10	20
...	...	...	...	...	...

Essentially, a1, b1, c1, d1 and a2, b2, c2, d2 represent the same values/models but in a different order. I want to match them and create a new table that will look like this:

#	a	b	c	d	length	age
...	...	...	...	...	...	...

Intuitively, I'd create two for loops of this type:

for i in range(len(file1)):
       for j in range(len(file2)):  
           if a1[i] == a2[j] and b1[i]==b2[j] and c1[i]==c2[j] and d1[i]==d2[j]:
              #some calculations on age and length

I wonder if there is a more robust way that would avoid having a nested for loop.

UPD: I forgot to mention that I need to match the a, b, c, d terms because they describe the model parameters.

CodePudding user response：

You can use itertools.product, to merge those nested loops into one loop. It will still be a loop, but a bit nicer.

for (i, j) in itertools.product(range(len(file1)), range(len(file2))):  
    if a1[i] == a2[j] and b1[i]==b2[j] and c1[i]==c2[j] and d1[i]==d2[j]:
        #some calculations stored and returned

CodePudding user response：

if your goal is to reduce nested loop but you are ok with still iterating over each file, would this work for your case?

s1 = set()
for i in range(len(file1)):
    s1.add([ a1[i], b1[i], c1[i], d1[i] ])
for j in range(len(file2)):
    if [ a2[j], b2[j], c2[j], d2[j] ] in s1:
        perform something