Converting .txt file to Python dictionary-CodePudding

I'm trying to convert a .txt file like this to a Python dictionary:

18.10.2021       List Display                                                    
-----------------------------
 Selected Documents:        3
-----------------------------
|  Document|Description |Lng|
-----------------------------
|  VLX82304|Unit 523435 |EN |
|  VLX82340|Self  339304|EN |
|  VLX98234|Can  522018 |EN |
-----------------------------

I'd like to create a dictionary like so:

MyDict = {
"Document": "VLX82304", "VLX82340", "VLX98234",
"Description":  "Unit 523435", "Self  339304", "Can  522018"
[...] }

I have the following:

fileInfo = {"Document", "Description", "Lng"}

# > CLEANING UP .txt FILE 

LocalFile_LINES = []      # list to store file lines
# Read file
with open(".txt", 'r') as fp:
    # read an store all lines into list
    LocalFile_LINES = fp.readlines()
    NumLines = len(LocalFile_LINES)

# Write file
with open("CLEANED.txt", 'w') as fp:
    # iterate each line
    for number, line in enumerate(LocalFile_LINES):
        # delete line 5 and 8. or pass any Nth line you want to remove
        if number not in [0,1,2,3,4,5,NumLines-1, NumLines]:
            # The "NumLines-1" removes the actual "------", whereas NuMLines removes a space at the end
            fp.write(line)

# Getting num lines of newly CLEANED .txt file
txtCLEANED = open("CLEANED.txt", "r")
NumLines_CLEANED = txtCLEANED.readlines()
CLEANED_len = len(NumLines_CLEANED)
listIndex = list( range(0,CLEANED_len-1) )    # Creates a series of numbers 


# > CONVERTED.CLEANED.txt FILE TO PY DICT

Delimited = []
with open("CLEANED.txt", 'r') as fp:
    for line in fp:
        Delimited = line.split("|")
        newItem = str( Delimited[1] )
        fileInfo["Document"].append( newItem )

but then I get an error at the very last line saying "TypeError: 'set' object is not subscriptable" when it should be a list ...

Could anyone please provide any input on how to resolve this issue?

CodePudding user response：

This (or something like this) should work for this use case. Note that I used a string instead of reading from a file, as it's a bit easier to test with.

from pprint import pprint


file_contents = """
18.10.2021       List Display
-----------------------------
 Selected Documents:        3
-----------------------------
|  Document|Description |Lng|
-----------------------------
|  VLX82304|Unit 523435 |EN |
|  VLX82340|Self  339304|EN |
|  VLX98234|Can  522018 |EN |
-----------------------------
""".strip()

_, col_headers, cols, _ = file_contents.rsplit('-----------------------------', 3)
col_headers = [h.strip() for h in col_headers.strip('\n|').split('|')]
cols = [line.strip(' |').split('|') for line in cols.strip().split('\n')]

my_dict = dict(zip(col_headers, zip(*cols)))

pprint(my_dict)

Output:

{'Description': ('Unit 523435 ', 'Self  339304', 'Can  522018 '),
 'Document': ('VLX82304', 'VLX82340', 'VLX98234'),
 'Lng': ('EN', 'EN', 'EN')}

NB: if you have an a text file and want to read the string contents, you can do it like below.

with open('my_file.txt') as in_file:
    file_contents = in_file.read()

# file_contents should now be a string with the contents of the file