Home > Mobile >  While parsing a file in python do some operation in a range of line
While parsing a file in python do some operation in a range of line

Time:12-07

I'm reading a 15GB of file in python, my code looks like that:

infile = open(file, "r")
count=0
line = infile.readline()
num_lines = int(sum(1 for line in open(file)))
while line:
     if count%2==0:
            if count>num_lines:
                break
            fields=line.split(";")
            tr=int(fields[0].split(",")[1])
            for ff in fields[1:]:
                ffsplit=ff.split(",")
                address=int(ffsplit[0])
                amount=int(ffsplit[1])
                if address not in add_balance.keys():
                    add_balance[address]=-amount
                else:
                    add_balance[address]-=amount
                if address not in de_send.keys():
                    de_send[address]=1
                else:
                    de_send[address] =1
        else:
            fields=line.split(";")
            for ff in fields:
                ffsplit=ff.split(",")
                address=int(ffsplit[0])
                amount=int(ffsplit[1])
                if address not in add_balance.keys():
                    add_balance[address]=amount
                else:
                    add_balance[address] =amount
                if address not in de_rec.keys():
                    de_rec[address]=1
                else:
                    de_rec[address] =1
    count =1
    line=infile.readline()

Now when tr is in a certain range ([100000,200000],[200000,300000] and so on) I need to create a networkX graph (adding tr and address in the range as nodes) in that range and do some other operations while still updating the dictionaries.

tr works like an index so starting from 1 every two line (that's the reason of the count%2==0) in increase by 1

I tried to create a def createGraph that while reading the file also creates nodes in that range. My problem is that every time i create the graph I start reading the file from the beginning so obviously it wasn't computationally efficient.

How can I starting from a certain tr (let's say 100000) create a graph every 100000 tr inside the whlie clause?

CodePudding user response:

if the file never change, you can precompute the position of the desire line using .tell and then use .seek method to move to that line and start working from there

>>> with open("test.txt","w") as file: #demostration file
        for n in range(10):
            print("line",n,file=file)

>>> desire_line=4
>>> position_line=0
>>> with open("test.txt") as file: #get the line position
        for i,n in enumerate(iter(file.readline,"")):
            if i==desire_line:
                break
        position_line=file.tell()

>>> with open("test.txt") as file:
        file.seek(position_line)
        for line in file:
            print(line)

    
40
line 5

line 6

line 7

line 8

line 9

>>> 

if the file does change, in particular in the lines prior to your desire point which will messed up the seek, you can use the itertools module to help you get there

>>> import itertools
>>> with open("test.txt") as file:
        for line in itertools.islice(file,5,None):
            print(line)

    
line 5

line 6

line 7

line 8

line 9

>>> 

For more alternatives check this answer

  • Related