Home > Software design >  Find number in line s.startswith python
Find number in line s.startswith python

Time:07-27

For example I have a some text fragment in file:


         Item               Value     Threshold  Converged?
 Maximum Force            0.009497     0.000450     NO 
 RMS     Force            0.002723     0.000300     NO 
 Maximum Displacement     0.247463     0.001800     NO 
 RMS     Displacement     0.065734     0.001200     NO 

 SCF Done:  E(RPW91-PW91) =  -2381.36459172     A.U. after    1 cycles

And I try to extract two numbers from columns Value and Threshold, for each lines. For example consider first line

 Maximum Force            0.009497     0.000450     NO 

So, for example I try to get the value 0.009497

#!/usr/bin/python3.6
import glob
import os


def CutMAXFValue( Line: str) -> float:
    return float(Line.split()[2]) 

def CutSCFValue( Line: str) -> float:
    return float(Line.split()[4]) 

def GrepSCF( Filename: str , StartStep = 1):
   #print(Filename)
    result = list()
    Step = StartStep
    with open(Filename, 'r') as f:
        lines = f.readlines()
    for s in lines:
        if s.startswith("SCF Done:"):
            result.append( (Step, CutSCFValue(s) ) )
            Step  = 1
    return result  


def GrepMAXF( Filename: str , StartStep = 1):
   #print(Filename)
    result = list()
    Step = StartStep
    with open(Filename, 'r') as f:
        lines = f.readlines()
    for s in lines:
        if s.startswith("Maximum Force"):
            result.append( (Step, CutMAXFValue(s) ) )
            Step  = 1
    return result  


def DraftListSteps(f: str):
    result = list()
    SubFiles = GetOutFilesBeginsWith( f ) #support function to read from file
    MaxRerun = GetMaxRerun( SubFiles )
    startstep = 1
    for rerun in range(MaxRerun 1):
        RFiles = GetFilesForRerun( SubFiles, rerun )
        DoneFile = next( x for x in RFiles if x.endswith("out") or x.endswith("outERR") )
        MAXF = GrepMAXF(DoneFile, startstep)
        MinStep = min(MAXF, key = lambda x: x[1] )
        startstep = MinStep[0]
        result.append(MAXF)

        # SCF = GrepSCF(DoneFile, startstep)
        # MinStep = min(MAXF, key = lambda x: x[1] )
        # startstep = MinStep[0]
        # result.append(SCF)
    return result  

And after run script I get the sam error:

Traceback (most recent call last):
  File "forker.py", line 123, in <module>
    draft = DraftListSteps( f )
  File "forker.py", line 91, in DraftListSteps
    MinStep = min(MAXF, key = lambda x: x[1] )
ValueError: min() arg is an empty sequence

How to fix this error and extract required value? So, regexp don't work in tis case, thoug maybe I wrong code pattern for regexp...

If I exctract value of -2381.36459172 from SCF Done fragment this code work perfectly, but if I use code for get 0.009497 it's not work...

CodePudding user response:

I'll bet anything it's a TAB between Maximum and Force, so change

if s.startswith("Maximum Force"):

to

if s.startswith("Maximum\tForce"):

Or you can handle either with a regular expression:

if re.match(r'Maximum\s Force', s):

CodePudding user response:

An example of reading this using csv (without actually bothering with a Dialect):

with open(inputfile_name) as infile:
    reader = csv.reader(infile, delimiter=' ', skipinitialspace=True)
    datarows = [row for row in reader if row[0] != "Item"] #filter out headers
    print(datarows)

Or, if it's tab-delimited:

with open(inputfile_name_t) as infile:
    reader = csv.reader(infile, delimiter='\t', skipinitialspace=True)
    datarows = [row for row in reader if row[1] != "Item"] #filter out headers
    print(datarows)

prints (cleaned up for clarity):

[['Maximum', 'Force', '0.009497', '0.000450', 'NO', ''], 
 ['RMS', 'Force', '0.002723', '0.000300', 'NO', ''],
 ['Maximum', 'Displacement', '0.247463', '0.001800', 'NO', ''],
 ['RMS', 'Displacement', '0.065734', '0.001200', 'NO', '']]

That's for the space-delimited version; the tab-delimited is slightly different output, but still gets you where you need to go.

You might need to adjust the subscript in the filtering line, depending on whether there's a tab or space at start-of-line.

  • Related