Home > Back-end >  Extract lines between two sentence in a file
Extract lines between two sentence in a file

Time:11-02

I have a file that contains these informations

number of atoms
   2
Atom labels, atom number, coords (Angstrom)
H    1    0.00000000    0.00000000    0.00000000
H    1    0.00000000    0.00000000    0.74080000
Overall charge
   0
Number of basis funcs
   2
Maximum number of primitives

I want to extract the lines between the line that starts with "Atom" and line starts with "Overall" I tried with code below, but instead of getting

H    1    0.00000000    0.00000000    0.00000000
H    1    0.00000000    0.00000000    0.74080000

I got an empty file without any lines.

infile = open('h2_sample.input','r')
ouput = open('coordinate.txt','w')
copy = False
coordinate = []
for line in infile:
    if line.strip() == "(Angstrom)":
        copy = True
        coordinate = []  
    elif line.strip() == 'Overall':
        copy = False 
        for strings in coordinate:
            output.write(strings   '\n')
    elif copy:
        coordinate.append(line.strip())

What do u think, I did wrong ?

CodePudding user response:

  1. Read all the lines of the file
  2. Find the lines you want to keep
  3. Write to output file
with open("h2_sample.input.txt") as infile:
    lines = infile.read().splitlines()
    
start = [i for i, line in enumerate(lines) if line.startswith("Atom ")][0]
end = [i for i, line in enumerate(lines) if line.startswith("Overall")][0]

with open("coordinate.txt", "w") as outfile:
    outfile.write("\n".join(lines[start 1:end]))
Output:

coordinate.txt:

H    1    0.00000000    0.00000000    0.00000000
H    1    0.00000000    0.00000000    0.74080000

CodePudding user response:

EDIT: As replies have pointed out, you're likely attempting to check if your strings are in the line. This is also a valid approach, I've edited the code block to do this. (My original answer is also below.)


In Python, string.strip() will remove leading and trailing whitespace, not return the first or last word (as you're using it). You also need to use output.close() and call .readlines() on infile to turn it into an array of lines.


infile = open('h2_sample.input', 'r').readlines()
ouput = open('coordinate.txt', 'w')
copy = False
coordinate = []
for line in infile:
    if '(Angstrom)' in line:
        copy = True
        coordinate = []
    elif 'Overall' in line:
        copy = False
        for strings in coordinate:
            output.write(strings   '\n')
    elif copy:
        coordinate.append(line)

output.close()

CodePudding user response:

First. You should use context manager because you are not closing files "infile" and "ouput" or use at the end infile.close() and ouput.close() Try:

with open('h2_sample.input','r') as infile, open('coordinate.txt','w') as ouput:

Then in indent write lines. Line if line.strip() == "(Angstrom)": checks if line is equal (Angstrom). You should use:

if "(Angstrom)" in line.strip():

It will return True if "Angstrom" is in line. The same goes for the first elif statement.

  • Related