I have the following sample text file:
DATASET
OBJTYPE "mesh2d"
BEGALD
ND 58673
NC 116294
TIMEUNITS SECONDS
TS 0 1.98849600e 08
0.000000000e 00
0.56000000e 00
0.200000000e 00
0.00000000e 00
0.100000000e 00
0.00000000e 00
0.00000000e 00
0.73400000e 00
TS 0 1.98853209e 08
0.00000000e 00
1.00500000e 00
4.00000000e 00
6.00000000e-05
9.00000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98856959e 08
0.00000000e 00
1.38000000e 00
4.00000000e 00
3.00000000e-05
8.10000000e 00
2.45000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98860419e 08
0.00000000e 00
1.40000000e 00
7.00000000e 00
3.00000000e-05
9.00000000e 00
0.00000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98864081e 08
0.00000000e 00
0.00000000e 00
0.00000000e 00
3.00000000e-05
0.00000000e 00
0.00000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98867619e 08
0.00000000e 00
0.00000000e 00
8.00000000e 00
3.50000000e-05
10.00000000e 00
0.00000000e 00
5.50000000e 00
0.00000000e 00
ENDDS
I want to extract the time stamps from the line starting with 'TS 0 '
and the 2nd, 5th and 8th lines after every 'TS 0 '
match is found. Now, I have huge file which is more than 10 GB, so I don't want to read the whole file into memory.
This is what I could come up with:
with open(r"file") as f:
for line in f:
if line.startswith("TIMEUNITS SECONDS"):
break # file handlers will start from next line
time=[] # list for storing time stamps
line2=[] # or lines=[2,5,8]
line5=[]
line8=[]
line
for line in f:
if line.startswith("TS"):
print(line.strip()) # extract all TS
ts=float(line.split()[2])
time.append(ts)
It only extracts the time stamps but how to extract the 2nd,5th and 8th lines using a loop or any other faster method without reading the whole file.
CodePudding user response:
A file object is iterable in python, and retains its position between calls to iter
, which you've used to skip the initial section. Keep using the same technique to find the lines you need:
with open(r"file") as f:
for line in f:
if line.startswith("TIMEUNITS SECONDS"):
break
time = []
line2 = []
line5 = []
line8 = []
for line in f:
if line.startswith("TS"):
ts = float(line.strip().split()[2])
time.append(ts)
for _ in range(2):
line = next(file)
line2.append(float(line.strip()))
for _ in range(3):
line = next(file)
line5.append(float(line.strip()))
for _ in range(3):
line = next(file)
line8.append(float(line.strip()))
Now that you have the basic structure down, you can factor out the repeated code into a function and add some error checking:
def find(file, s):
for line in file:
if line.startswith(s):
return line
return None
def skip(file, n):
for i, line in zip(range(n), file):
pass
return line if i == n - 1 else None
def load(filename):
with open(filename) as f:
if not find(f, "TIMEUNITS SECONDS"):
return None
time = []
line2 = []
line5 = []
line8 = []
while True:
if not (line := find(f, "TS")):
break
time.append(float(line.strip().split()[2]))
if not (line := skip(f, 2)):
break
line2.append(float(line.strip()))
if not (line := skip(f, 3):
break
line5.append(float(line.strip()))
if not (line := skip(f, 3):
break
line8.append(float(line.strip()))
return time, line2, line5, line8