I'm trying to convert a .txt file like this to a Python dictionary:
18.10.2021 List Display
-----------------------------
Selected Documents: 3
-----------------------------
| Document|Description |Lng|
-----------------------------
| VLX82304|Unit 523435 |EN |
| VLX82340|Self 339304|EN |
| VLX98234|Can 522018 |EN |
-----------------------------
I'd like to create a dictionary like so:
MyDict = {
"Document": "VLX82304", "VLX82340", "VLX98234",
"Description": "Unit 523435", "Self 339304", "Can 522018"
[...] }
I have the following:
fileInfo = {"Document", "Description", "Lng"}
# > CLEANING UP .txt FILE
LocalFile_LINES = [] # list to store file lines
# Read file
with open(".txt", 'r') as fp:
# read an store all lines into list
LocalFile_LINES = fp.readlines()
NumLines = len(LocalFile_LINES)
# Write file
with open("CLEANED.txt", 'w') as fp:
# iterate each line
for number, line in enumerate(LocalFile_LINES):
# delete line 5 and 8. or pass any Nth line you want to remove
if number not in [0,1,2,3,4,5,NumLines-1, NumLines]:
# The "NumLines-1" removes the actual "------", whereas NuMLines removes a space at the end
fp.write(line)
# Getting num lines of newly CLEANED .txt file
txtCLEANED = open("CLEANED.txt", "r")
NumLines_CLEANED = txtCLEANED.readlines()
CLEANED_len = len(NumLines_CLEANED)
listIndex = list( range(0,CLEANED_len-1) ) # Creates a series of numbers
# > CONVERTED.CLEANED.txt FILE TO PY DICT
Delimited = []
with open("CLEANED.txt", 'r') as fp:
for line in fp:
Delimited = line.split("|")
newItem = str( Delimited[1] )
fileInfo["Document"].append( newItem )
but then I get an error at the very last line saying "TypeError: 'set' object is not subscriptable" when it should be a list ...
Could anyone please provide any input on how to resolve this issue?
CodePudding user response:
This (or something like this) should work for this use case. Note that I used a string instead of reading from a file, as it's a bit easier to test with.
from pprint import pprint
file_contents = """
18.10.2021 List Display
-----------------------------
Selected Documents: 3
-----------------------------
| Document|Description |Lng|
-----------------------------
| VLX82304|Unit 523435 |EN |
| VLX82340|Self 339304|EN |
| VLX98234|Can 522018 |EN |
-----------------------------
""".strip()
_, col_headers, cols, _ = file_contents.rsplit('-----------------------------', 3)
col_headers = [h.strip() for h in col_headers.strip('\n|').split('|')]
cols = [line.strip(' |').split('|') for line in cols.strip().split('\n')]
my_dict = dict(zip(col_headers, zip(*cols)))
pprint(my_dict)
Output:
{'Description': ('Unit 523435 ', 'Self 339304', 'Can 522018 '),
'Document': ('VLX82304', 'VLX82340', 'VLX98234'),
'Lng': ('EN', 'EN', 'EN')}
NB: if you have an a text file and want to read the string contents, you can do it like below.
with open('my_file.txt') as in_file:
file_contents = in_file.read()
# file_contents should now be a string with the contents of the file