I have a large text file with daily entries:
Date - 01031991
Location- Worcester, MA
Status - very long sentence that
continues over the next few
lines like so.
Author- Security 87
Date- 01071991
Location - Fort-Devens, MA
Status - another long%$@ sent%$#ence
with space and all typ&^%$es
of characters.
Author - Security 92
I tried to get the data into a dictionary, but realized that was not going to work as dictionaries can't have duplicate keys. How should I approach this problem? I want an Excel workbook with the row names as column like this:
Date | Location | Status | Author
____________________________________________________________
01031991 | Worcester, MA | long sentence | Security 87
01071991 | Fort-Devens, MA | long sentence | Security 92
____________________________________________________________
CodePudding user response:
A non-regex approach:
# read text from file
path = # file name
with open(path, 'r') as fd:
text = fd.read()
# process text line by line
data = {}
last_key = ''
for line in text.split('\n'):
if line.startswith(' '):
data[last_key] = ' ' line.strip(' -:\n\t')
else:
key, _, content = line.partition(' ')
data[key] = content.strip().lstrip('-:')
last_key = key
# check result
for k, v in data.items():
print(k, v)