Home > Blockchain >  How to optimize reading and cleaning file?
How to optimize reading and cleaning file?

Time:11-23

I have a file, which contains strings separated by spaces, tabs and carriage return:

one     two

    three

         four

I'm trying to remove all spaces, tabs and carriage return:

def txt_cleaning(fname):
    with open(fname) as f: 
    new_txt = []
        fname = f.readline().strip()
        new_txt  = [line.split() for line in f.readlines()]
    return new_txt

Output:

[['one'], ['two'], [], ['three'], [], ['four']]

Expecting, without importing libraries:

['one', 'two', 'three', 'four']

CodePudding user response:

def txt_cleaning(fname):
    new_text = []
    with open(fname) as f:
        for line in f.readlines():
            new_text  = [s.strip() for s in line.split() if s]
    return new_text

Or

def txt_cleaning(fname):
    with open(fname) as f:
        return [word.strip() for word in f.read().split() if word]

CodePudding user response:

My method:

  • use read (not readline) to get the whole text in a single element
  • replace tabs and newlines with a space
  • split
def txt_cleaning(fname):
  with open(fname) as f:
    return f.read().replace( '\t', ' ').replace( '\n', ' ').split()
  • Related