I want to keep only the lines before a certain string in a txt file-CodePudding

I want all lines before the line that has string 'VarList'. I cannot understand why the solutions proposed elsewhere do not work for my txt file.

To simplify:

I have many .txt files that look like this:

text1=text
text2=text
(...)
textN=text
VarList=text
(...)
End

I just want this:

text1=text
text2=text
(...)
textN=text

How can I get it for all txt files in a directory path?

First I have tried this:

import os

for subdir, dirs, files in os.walk('C:\\Users\\nigel\\OneDrive\\Documents\\LAB\\lean\\.txt'):
    for file in files:
        output=[]
        with open(file, 'r') as inF:
            for line in inF:
                output.append(line)
                if 'VarList' in line: break
        f=open(file, 'w')
        blank=['']
        [f.write(x) for x in output]
        [f.write(x '\n') for x in blank]
        f.close()

Nothing at all changes in the txt file, but the file has string 'VarList' in one of the lines. So, why isn't it working?

Then:

import re

def trim(test_string, removal_string):
    return re.sub(r'^(.*?)('  removal_string   ')(.*)$', r'\1'   r'\2', test_string)

def cleanFile(file_path, removal_string):
    with open(file_path) as master_text:
        return trim(master_text, removal_string)

cleanFile(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'VarList')

and I get this error:

--------------------------------------------------------------------------- TypeError                                 Traceback (most recent call last) Input In [2], in <cell line: 16>()
     13     with open(file_path) as master_text:
     14         return trim(master_text, removal_string)
---> 16 cleanFile(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'VarList')

Input In [2], in cleanFile(file_path, removal_string)
     12 def cleanFile(file_path, removal_string):
     13     with open(file_path) as master_text:
---> 14         return trim(master_text, removal_string)

Input In [2], in trim(test_string, removal_string)
      9 def trim(test_string, removal_string):
---> 10     return re.sub(r'^(.*?)('  removal_string   ')(.*)$', r'\1'   r'\2', test_string)

File ~\Anaconda3\lib\re.py:210, in sub(pattern, repl, string, count, flags)
    203 def sub(pattern, repl, string, count=0, flags=0):
    204     """Return the string obtained by replacing the leftmost
    205     non-overlapping occurrences of the pattern in string by the
    206     replacement repl.  repl can be either a string or a callable;
    207     if a string, backslash escapes in it are processed.  If it is
    208     a callable, it's passed the Match object and must return
    209     a replacement string to be used."""
--> 210     return _compile(pattern, flags).sub(repl, string, count)

TypeError: expected string or bytes-like object

Finally, I have tried:

with open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'r') as importFile, open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00_temp.txt', 'w') as exportFile:
    head, sep, tail = importFile.partition('VarList')
    exportFile = head

importFile.close()
exportFile.close()

Error:

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [2], in <cell line: 3>() 1 # Solution 3 3 with open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'r') as importFile, open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00_temp.txt', 'w') as exportFile: ----> 4 head, sep, tail = importFile.partition('VarList') 5 exportFile = head 7 importFile.close()

AttributeError: '_io.TextIOWrapper' object has no attribute 'partition'

Does anyone have a clue about what is going on in here?

CodePudding user response：

You're appending to the output before you check for "VarList". The correct way would be:

with open(file, 'r') as inF:
    for line in inF:      
        if 'VarList' in line:
            break
        output.append(line)

CodePudding user response：

I think this task could be made easier by using Python's pathlib as it has some useful methods for reading and writing text files.

pathlib also has glob functionality that allows the addition of “**” to mean “this directory and all subdirectories, recursively”.

For truncating the file, I have chosen to use Python's list comprehension to find the line that starts with the required string and then slice the list of lines at that point.

For example:

from pathlib import Path


def trim_files(dirname: Path, end_before: str) -> None:
    for file in dirname.glob("**/*.txt"):
        content = file.read_text().splitlines()
        location = [content.index(line)
                    for line in content if line.startswith(end_before)]
        if location:
            file.write_text("\n".join(content[:location[0]]))


if __name__ == '__main__':
    search_directory = Path.home().joinpath('Documents', 'LAB')
    trim_files(search_directory, 'VarList')