I have a large text file with daily entries:
Date - 01031991
Location- Worcester, MA
Status - very long sentence that
continues over the next few
lines like so.
Author- Security 87
Date- 01071991
Location - Fort-Devens, MA
Status: another long%$@ sent%$#ence
with space and all typ&^%$es
of characters.
Author - Security 92
I am using Python to turn this text file into an Excel workbook. I expect to end up with a workbook containing the columns and values in this text file. I have written a script as follow:
myfile = open(txtfile, 'r')
dictionary = {}
for line in myfile:
k, v = line.strip().split("-", maxsplit=1)
dictionary[k] = v
myfile.close()
For now, I can't get the entire sentence in "Status" because the end of line is followed by a space, and next line, then a lot of spaces before the next word. As in, "very long sentence that \n continues over the next few \n ...".
How do I obtain the entire sentence in to my dictionary? Right now, I only get:
print(dictionary)
{'Date ': ' 01031991', 'Location': ' Worcester, MA', 'Status ': ' very long sentence that'}
CodePudding user response:
A non-regex approach:
# read text from file
path = # file name
with open(path, 'r') as fd:
text = fd.read()
# process text line by line
data = {}
last_key = ''
for line in text.split('\n'):
if line.startswith(' '):
data[last_key] = ' ' line.strip(' -:\n\t')
else:
key, _, content = line.partition(' ')
data[key] = content.strip().lstrip('-:')
last_key = key
# check result
for k, v in data.items():
print(k, v)