Home > Enterprise >  Reading Large Text Content Into Excel Workbook
Reading Large Text Content Into Excel Workbook

Time:07-07

I have a large text file with daily entries:

Date - 01031991
Location- Worcester, MA
Status - very long sentence that
        continues over the next few
        lines like so.
Author- Security 87

Date- 01071991
Location - Fort-Devens, MA
Status - another long%$@ sent%$#ence 
        with space and all typ&^%$es
        of characters.
Author - Security 92

I tried to get the data into a dictionary, but realized that was not going to work as dictionaries can't have duplicate keys. How should I approach this problem? I want an Excel workbook with the row names as column like this:

Date     |   Location      |       Status      |     Author
____________________________________________________________
01031991 | Worcester, MA   |  long sentence    | Security 87
01071991 | Fort-Devens, MA |  long sentence    | Security 92
____________________________________________________________

CodePudding user response:

A non-regex approach:

# read text from file
path = # file name
with open(path, 'r') as fd:
   text = fd.read()

# process text line by line
data = {}
last_key = ''
for line in text.split('\n'):
    if line.startswith(' '):
        data[last_key]  = ' '   line.strip(' -:\n\t')
    else:
        key, _, content = line.partition(' ')
        data[key] = content.strip().lstrip('-:')
        last_key = key

# check result
for k, v in data.items():
    print(k, v)
  • Related