Home > Back-end >  Python - Data not appending to list & Missing dict key after Read File
Python - Data not appending to list & Missing dict key after Read File

Time:10-15

I have a txt file called 'machinary.txt'. This is a sample data, where I only want to grab data starting from 'Section' and the corresponding input,output,bit_match,input obtained and output obtained below it. Anything before the first 'Section', is ignored.

The 'BIT_MATCH' in the txt file, refers to the 'input->' and 'output->' below it. Section_A has two inputs & outputs.

Any 'empty' values are defined as 'N/A'.

'machinary.txt'

unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011

Code I've done:

import pprint

with open("machinary.txt", "r") as file:
    flag = False
    headers = 'Section,Input,Output,Bit Matched'.split(',')

    sub_dict = dict.fromkeys(headers,'N/A')
    main_dict = {}
    bit_match_list = []
    input_list =[]
    output_list = []

    for eachline in file:

        if 'Section' in eachline:
            flag = True
            sub_dict['Section'] = eachline.strip().split()[-1]

        if flag:
            if 'BIT_MATCH' in eachline:
                bit_match_list.append(eachline.strip())
                bit_match_list.append(next(file).strip())
                bit_match_list.append(next(file).strip())
                sub_dict['Bit Matched'] = bit_match_list
                #sub_dict['Input bit match']=next(file).strip()
                #sub_dict['Output bit match'] = next(file).strip()

            if 'Input->' in eachline:
                input_list.append(eachline.strip())
                sub_dict['Input'] = input_list
                output_list.append (next(file).strip())
                sub_dict['Output'] = output_list
                main_dict[sub_dict['Section']] = sub_dict
                sub_dict = dict.fromkeys(headers, 'N/A')
                bit_match_list = []
                input_list = []
                output_list = []

pprint.pprint (main_dict)

Output from code above:

{'N/A': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
                         'Input Obtained: 2',
                         'Output Obtained: 2',
                         'BIT_MATCH: It matched at bit: 2',
                         'Input Obtained: 3',
                         'Output Obtained: 3'],
         'Input': ['Input-> 320'],
         'Output': ['Output-> 321'],
         'Section': 'N/A'},
 'Section_A_Machinary': {'Bit Matched': 'N/A',
                         'Input': ['Input-> 101'],
                         'Output': ['Output-> 010'],
                         'Section': 'Section_A_Machinary'},
 'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
                                          'Input Obtained: 0',
                                          'Output Obtained: 0'],
                          'Input': ['Input-> 011'],
                          'Output': ['Output-> 011'],
                          'Section': 'Section_C_Distillery'}}

Expected output:

{'Section_A_Machinary': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
                         'Input Obtained: 2',
                         'Output Obtained: 2',
                         'BIT_MATCH: It matched at bit: 2',
                         'Input Obtained: 3',
                         'Output Obtained: 3'],
                         'Input': ['Input-> 101', 'Input-> 320'],
                         'Output': ['Output-> 010', 'Output->321'],
                         'Section': 'Section_A_Machinary'},
 'Section_B_Machinary': {'Bit Matched': 'N/A',
                         'Input': 'N/A',
                         'Output': 'N/A',
                         'Section': 'Section_B_Machinary'},
 'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
                                          'Input Obtained: 0',
                                          'Output Obtained: 0'],
                          'Input': ['Input-> 011'],
                          'Output': ['Output-> 011'],
                          'Section': 'Section_C_Distillery'}}

Sorry for the lengthy wording. Somehow it misses Section_B. And the 'input->' and 'output->' for section_A, seems to not be appending like I want it to be. Are there any simple ways to solve this, preferrably without altering the code above too much? Thanks!

CodePudding user response:

To parse the text to required structure you can use re module:

txt = """
unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011
"""

import re
from itertools import chain

out = {}

for section in re.findall(r"Section(?:(?!Section).) ", txt, flags=re.S):
    bit_matches = re.findall(r"BIT_MATCH.*", section)
    inp_out = re.findall(
        r"Input Obtained.*?Output Obtained.*?$", section, flags=re.S | re.M
    )

    inputs = re.findall(r"Input->.*", section)
    outputs = re.findall(r"Output->.*", section)

    name = section.splitlines()[0]

    out[name] = {
        "Bit Matched": list(
            chain.from_iterable(
                (a, *b.splitlines()) for a, b in zip(bit_matches, inp_out)
            )
        )
        or "N/A",
        "Input": inputs or "N/A",
        "Output": outputs or "N/A",
        "Section": name,
    }

print(out)

Prints:

{
    "Section_A_Machinary": {
        "Bit Matched": [
            "BIT_MATCH: It matched at bit: 1",
            "Input Obtained: 2",
            "Output Obtained: 2",
            "BIT_MATCH: It matched at bit: 2",
            "Input Obtained: 3",
            "Output Obtained: 3",
        ],
        "Input": ["Input-> 101", "Input-> 320"],
        "Output": ["Output-> 010", "Output-> 321"],
        "Section": "Section_A_Machinary",
    },
    "Section_B_Machinary": {
        "Bit Matched": "N/A",
        "Input": "N/A",
        "Output": "N/A",
        "Section": "Section_B_Machinary",
    },
    "Section_C_Distillery": {
        "Bit Matched": [
            "BIT_MATCH: It matched at bit: 2",
            "Input Obtained: 0",
            "Output Obtained: 0",
        ],
        "Input": ["Input-> 011"],
        "Output": ["Output-> 011"],
        "Section": "Section_C_Distillery",
    },
}
  • Related