I have this .txt file where there are special character which looks strange(screenshot). I copied this character Ctrl C and it pastes like this up arrow(screenshot) How do I remove this and blank lines along with unnecessary repeated header rows after few rows. My attempt -
remove_text = ['Trial Balance - Total Currency', 'Page:', 'Currency:', 'Balance Type:', 'ENTITY Range:', 'Ledger:', 'ENTITY:', '------------', '']
with open('MIC.txt') as oldfile, open('MICnew.txt', 'w') as newfile:
for line in oldfile:
for char in line:
if char in "":
line.replace(char,'')
newfile.write(line)
with open('MICnew.txt') as oldfile, open('MICnew.txt', 'w') as newfile:
for line in oldfile:
if not any(bad_word in line for bad_word in remove_text):
newfile.write(line)
with open('MICnew.txt','r ') as file:
for line in file:
if not line.isspace():
file.write(line)
My codes delete few unnecessary text and their lines but does not delete THE special char and blank lines
CodePudding user response:
You can delete any non-ascii character with the following:
cleaned_string = string_to_clean.encode("ascii", "ignore").decode()
CodePudding user response:
Or, you can use regex to get rid of any unessecary characters.
import re
with open('MIC.txt') as oldfile, open('MICnew.txt', 'w') as newfile:
for line in oldfile:
newfile.write(re.sub(r'[^a-zA-Z_0-9\s]','',line))