I have a txt file called 'machinary.txt'. This is a sample data, where I only want to grab data starting from 'Section' and the corresponding input,output,bit_match,input obtained and output obtained below it. Anything before the first 'Section', is ignored.
The 'BIT_MATCH' in the txt file, refers to the 'input->' and 'output->' below it. Section_A has two inputs & outputs.
Any 'empty' values are defined as 'N/A'.
'machinary.txt'
unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011
Code I've done:
import pprint
with open("machinary.txt", "r") as file:
flag = False
headers = 'Section,Input,Output,Bit Matched'.split(',')
sub_dict = dict.fromkeys(headers,'N/A')
main_dict = {}
bit_match_list = []
input_list =[]
output_list = []
for eachline in file:
if 'Section' in eachline:
flag = True
sub_dict['Section'] = eachline.strip().split()[-1]
if flag:
if 'BIT_MATCH' in eachline:
bit_match_list.append(eachline.strip())
bit_match_list.append(next(file).strip())
bit_match_list.append(next(file).strip())
sub_dict['Bit Matched'] = bit_match_list
#sub_dict['Input bit match']=next(file).strip()
#sub_dict['Output bit match'] = next(file).strip()
if 'Input->' in eachline:
input_list.append(eachline.strip())
sub_dict['Input'] = input_list
output_list.append (next(file).strip())
sub_dict['Output'] = output_list
main_dict[sub_dict['Section']] = sub_dict
sub_dict = dict.fromkeys(headers, 'N/A')
bit_match_list = []
input_list = []
output_list = []
pprint.pprint (main_dict)
Output from code above:
{'N/A': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
'Input Obtained: 2',
'Output Obtained: 2',
'BIT_MATCH: It matched at bit: 2',
'Input Obtained: 3',
'Output Obtained: 3'],
'Input': ['Input-> 320'],
'Output': ['Output-> 321'],
'Section': 'N/A'},
'Section_A_Machinary': {'Bit Matched': 'N/A',
'Input': ['Input-> 101'],
'Output': ['Output-> 010'],
'Section': 'Section_A_Machinary'},
'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
'Input Obtained: 0',
'Output Obtained: 0'],
'Input': ['Input-> 011'],
'Output': ['Output-> 011'],
'Section': 'Section_C_Distillery'}}
Expected output:
{'Section_A_Machinary': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
'Input Obtained: 2',
'Output Obtained: 2',
'BIT_MATCH: It matched at bit: 2',
'Input Obtained: 3',
'Output Obtained: 3'],
'Input': ['Input-> 101', 'Input-> 320'],
'Output': ['Output-> 010', 'Output->321'],
'Section': 'Section_A_Machinary'},
'Section_B_Machinary': {'Bit Matched': 'N/A',
'Input': 'N/A',
'Output': 'N/A',
'Section': 'Section_B_Machinary'},
'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
'Input Obtained: 0',
'Output Obtained: 0'],
'Input': ['Input-> 011'],
'Output': ['Output-> 011'],
'Section': 'Section_C_Distillery'}}
Sorry for the lengthy wording. Somehow it misses Section_B. And the 'input->' and 'output->' for section_A, seems to not be appending like I want it to be. Are there any simple ways to solve this, preferrably without altering the code above too much? Thanks!
CodePudding user response:
To parse the text to required structure you can use re
module:
txt = """
unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011
"""
import re
from itertools import chain
out = {}
for section in re.findall(r"Section(?:(?!Section).) ", txt, flags=re.S):
bit_matches = re.findall(r"BIT_MATCH.*", section)
inp_out = re.findall(
r"Input Obtained.*?Output Obtained.*?$", section, flags=re.S | re.M
)
inputs = re.findall(r"Input->.*", section)
outputs = re.findall(r"Output->.*", section)
name = section.splitlines()[0]
out[name] = {
"Bit Matched": list(
chain.from_iterable(
(a, *b.splitlines()) for a, b in zip(bit_matches, inp_out)
)
)
or "N/A",
"Input": inputs or "N/A",
"Output": outputs or "N/A",
"Section": name,
}
print(out)
Prints:
{
"Section_A_Machinary": {
"Bit Matched": [
"BIT_MATCH: It matched at bit: 1",
"Input Obtained: 2",
"Output Obtained: 2",
"BIT_MATCH: It matched at bit: 2",
"Input Obtained: 3",
"Output Obtained: 3",
],
"Input": ["Input-> 101", "Input-> 320"],
"Output": ["Output-> 010", "Output-> 321"],
"Section": "Section_A_Machinary",
},
"Section_B_Machinary": {
"Bit Matched": "N/A",
"Input": "N/A",
"Output": "N/A",
"Section": "Section_B_Machinary",
},
"Section_C_Distillery": {
"Bit Matched": [
"BIT_MATCH: It matched at bit: 2",
"Input Obtained: 0",
"Output Obtained: 0",
],
"Input": ["Input-> 011"],
"Output": ["Output-> 011"],
"Section": "Section_C_Distillery",
},
}