I want all lines before the line that has string 'VarList'. I cannot understand why the solutions proposed elsewhere do not work for my txt file.
To simplify:
I have many .txt files that look like this:
text1=text
text2=text
(...)
textN=text
VarList=text
(...)
End
I just want this:
text1=text
text2=text
(...)
textN=text
How can I get it for all txt files in a directory path?
First I have tried this:
import os
for subdir, dirs, files in os.walk('C:\\Users\\nigel\\OneDrive\\Documents\\LAB\\lean\\.txt'):
for file in files:
output=[]
with open(file, 'r') as inF:
for line in inF:
output.append(line)
if 'VarList' in line: break
f=open(file, 'w')
blank=['']
[f.write(x) for x in output]
[f.write(x '\n') for x in blank]
f.close()
Nothing at all changes in the txt file, but the file has string 'VarList' in one of the lines. So, why isn't it working?
Then:
import re
def trim(test_string, removal_string):
return re.sub(r'^(.*?)(' removal_string ')(.*)$', r'\1' r'\2', test_string)
def cleanFile(file_path, removal_string):
with open(file_path) as master_text:
return trim(master_text, removal_string)
cleanFile(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'VarList')
and I get this error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [2], in <cell line: 16>()
13 with open(file_path) as master_text:
14 return trim(master_text, removal_string)
---> 16 cleanFile(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'VarList')
Input In [2], in cleanFile(file_path, removal_string)
12 def cleanFile(file_path, removal_string):
13 with open(file_path) as master_text:
---> 14 return trim(master_text, removal_string)
Input In [2], in trim(test_string, removal_string)
9 def trim(test_string, removal_string):
---> 10 return re.sub(r'^(.*?)(' removal_string ')(.*)$', r'\1' r'\2', test_string)
File ~\Anaconda3\lib\re.py:210, in sub(pattern, repl, string, count, flags)
203 def sub(pattern, repl, string, count=0, flags=0):
204 """Return the string obtained by replacing the leftmost
205 non-overlapping occurrences of the pattern in string by the
206 replacement repl. repl can be either a string or a callable;
207 if a string, backslash escapes in it are processed. If it is
208 a callable, it's passed the Match object and must return
209 a replacement string to be used."""
--> 210 return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
Finally, I have tried:
with open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'r') as importFile, open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00_temp.txt', 'w') as exportFile:
head, sep, tail = importFile.partition('VarList')
exportFile = head
importFile.close()
exportFile.close()
Error:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [2], in <cell line: 3>() 1 # Solution 3 3 with open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'r') as importFile, open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00_temp.txt', 'w') as exportFile: ----> 4 head, sep, tail = importFile.partition('VarList') 5 exportFile = head 7 importFile.close()
AttributeError: '_io.TextIOWrapper' object has no attribute 'partition'
Does anyone have a clue about what is going on in here?
CodePudding user response:
You're appending to the output before you check for "VarList". The correct way would be:
with open(file, 'r') as inF:
for line in inF:
if 'VarList' in line:
break
output.append(line)
CodePudding user response:
I think this task could be made easier by using Python's pathlib as it has some useful methods for reading and writing text files.
pathlib also has glob
functionality that allows the addition of “**” to mean “this directory and all subdirectories, recursively”.
For truncating the file, I have chosen to use Python's list comprehension to find the line that starts with the required string and then slice the list of lines at that point.
For example:
from pathlib import Path
def trim_files(dirname: Path, end_before: str) -> None:
for file in dirname.glob("**/*.txt"):
content = file.read_text().splitlines()
location = [content.index(line)
for line in content if line.startswith(end_before)]
if location:
file.write_text("\n".join(content[:location[0]]))
if __name__ == '__main__':
search_directory = Path.home().joinpath('Documents', 'LAB')
trim_files(search_directory, 'VarList')