Home > other >  Reading Large Text Content Into Dictionary
Reading Large Text Content Into Dictionary

Time:07-07

I have a large text file with daily entries:

Date - 01031991
Location- Worcester, MA
Status - very long sentence that
        continues over the next few
        lines like so.
Author- Security 87

Date- 01071991
Location - Fort-Devens, MA
Status: another long%$@ sent%$#ence 
        with space and all typ&^%$es
        of characters.
Author - Security 92

I am using Python to turn this text file into an Excel workbook. I expect to end up with a workbook containing the columns and values in this text file. I have written a script as follow:

myfile = open(txtfile, 'r')
dictionary = {}

for line in myfile:
    
    k, v = line.strip().split("-", maxsplit=1)
    dictionary[k] = v
    
myfile.close()

For now, I can't get the entire sentence in "Status" because the end of line is followed by a space, and next line, then a lot of spaces before the next word. As in, "very long sentence that \n continues over the next few \n ...".

How do I obtain the entire sentence in to my dictionary? Right now, I only get:

print(dictionary)
{'Date ': ' 01031991', 'Location': ' Worcester, MA', 'Status ': ' very long sentence that'}

CodePudding user response:

A non-regex approach:

# read text from file
path = # file name
with open(path, 'r') as fd:
   text = fd.read()

# process text line by line
data = {}
last_key = ''
for line in text.split('\n'):
    if line.startswith(' '):
        data[last_key]  = ' '   line.strip(' -:\n\t')
    else:
        key, _, content = line.partition(' ')
        data[key] = content.strip().lstrip('-:')
        last_key = key

# check result
for k, v in data.items():
    print(k, v)
  • Related