Python - Data not appending to list & Missing dict key after Read File-CodePudding

I have a txt file called 'machinary.txt'. This is a sample data, where I only want to grab data starting from 'Section' and the corresponding input,output,bit_match,input obtained and output obtained below it. Anything before the first 'Section', is ignored.

The 'BIT_MATCH' in the txt file, refers to the 'input->' and 'output->' below it. Section_A has two inputs & outputs.

Any 'empty' values are defined as 'N/A'.

'machinary.txt'

unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011

Code I've done:

import pprint

with open("machinary.txt", "r") as file:
    flag = False
    headers = 'Section,Input,Output,Bit Matched'.split(',')

    sub_dict = dict.fromkeys(headers,'N/A')
    main_dict = {}
    bit_match_list = []
    input_list =[]
    output_list = []

    for eachline in file:

        if 'Section' in eachline:
            flag = True
            sub_dict['Section'] = eachline.strip().split()[-1]

        if flag:
            if 'BIT_MATCH' in eachline:
                bit_match_list.append(eachline.strip())
                bit_match_list.append(next(file).strip())
                bit_match_list.append(next(file).strip())
                sub_dict['Bit Matched'] = bit_match_list
                #sub_dict['Input bit match']=next(file).strip()
                #sub_dict['Output bit match'] = next(file).strip()

            if 'Input->' in eachline:
                input_list.append(eachline.strip())
                sub_dict['Input'] = input_list
                output_list.append (next(file).strip())
                sub_dict['Output'] = output_list
                main_dict[sub_dict['Section']] = sub_dict
                sub_dict = dict.fromkeys(headers, 'N/A')
                bit_match_list = []
                input_list = []
                output_list = []

pprint.pprint (main_dict)

Output from code above:

{'N/A': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
                         'Input Obtained: 2',
                         'Output Obtained: 2',
                         'BIT_MATCH: It matched at bit: 2',
                         'Input Obtained: 3',
                         'Output Obtained: 3'],
         'Input': ['Input-> 320'],
         'Output': ['Output-> 321'],
         'Section': 'N/A'},
 'Section_A_Machinary': {'Bit Matched': 'N/A',
                         'Input': ['Input-> 101'],
                         'Output': ['Output-> 010'],
                         'Section': 'Section_A_Machinary'},
 'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
                                          'Input Obtained: 0',
                                          'Output Obtained: 0'],
                          'Input': ['Input-> 011'],
                          'Output': ['Output-> 011'],
                          'Section': 'Section_C_Distillery'}}

Expected output:

{'Section_A_Machinary': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
                         'Input Obtained: 2',
                         'Output Obtained: 2',
                         'BIT_MATCH: It matched at bit: 2',
                         'Input Obtained: 3',
                         'Output Obtained: 3'],
                         'Input': ['Input-> 101', 'Input-> 320'],
                         'Output': ['Output-> 010', 'Output->321'],
                         'Section': 'Section_A_Machinary'},
 'Section_B_Machinary': {'Bit Matched': 'N/A',
                         'Input': 'N/A',
                         'Output': 'N/A',
                         'Section': 'Section_B_Machinary'},
 'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
                                          'Input Obtained: 0',
                                          'Output Obtained: 0'],
                          'Input': ['Input-> 011'],
                          'Output': ['Output-> 011'],
                          'Section': 'Section_C_Distillery'}}

Sorry for the lengthy wording. Somehow it misses Section_B. And the 'input->' and 'output->' for section_A, seems to not be appending like I want it to be. Are there any simple ways to solve this, preferrably without altering the code above too much? Thanks!

CodePudding user response：

To parse the text to required structure you can use re module:

txt = """
unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011
"""

import re
from itertools import chain

out = {}

for section in re.findall(r"Section(?:(?!Section).) ", txt, flags=re.S):
    bit_matches = re.findall(r"BIT_MATCH.*", section)
    inp_out = re.findall(
        r"Input Obtained.*?Output Obtained.*?$", section, flags=re.S | re.M
    )

    inputs = re.findall(r"Input->.*", section)
    outputs = re.findall(r"Output->.*", section)

    name = section.splitlines()[0]

    out[name] = {
        "Bit Matched": list(
            chain.from_iterable(
                (a, *b.splitlines()) for a, b in zip(bit_matches, inp_out)
            )
        )
        or "N/A",
        "Input": inputs or "N/A",
        "Output": outputs or "N/A",
        "Section": name,
    }

print(out)

Prints:

{
    "Section_A_Machinary": {
        "Bit Matched": [
            "BIT_MATCH: It matched at bit: 1",
            "Input Obtained: 2",
            "Output Obtained: 2",
            "BIT_MATCH: It matched at bit: 2",
            "Input Obtained: 3",
            "Output Obtained: 3",
        ],
        "Input": ["Input-> 101", "Input-> 320"],
        "Output": ["Output-> 010", "Output-> 321"],
        "Section": "Section_A_Machinary",
    },
    "Section_B_Machinary": {
        "Bit Matched": "N/A",
        "Input": "N/A",
        "Output": "N/A",
        "Section": "Section_B_Machinary",
    },
    "Section_C_Distillery": {
        "Bit Matched": [
            "BIT_MATCH: It matched at bit: 2",
            "Input Obtained: 0",
            "Output Obtained: 0",
        ],
        "Input": ["Input-> 011"],
        "Output": ["Output-> 011"],
        "Section": "Section_C_Distillery",
    },
}