Read text file to store data in a nested list-CodePudding

I have the following input file with the format as below.

SOME TEXT SOME TEXT SOME TEXT
SOME TEXT
RETCODE = 0

                        A = 1
                        B = 5
                        C = 3

                        D = 0
                        E = 0

                        D = 4
                        E = 1
                        G = 1


blah blah


---    ENDBLOCK

SOME TEXT
RETCODE = 0
                        A = 3
                        B = 2
                        C = 8
                        D = 6
                        E = 9
                        F = 3



blah blah

blah blah

---    ENDBLOCK

SOME TEXT
RETCODE = 0
                        A = 7
                        B = 2
                        C = 2
                        D = 9
                        E = 0

                        D = 1
                        E = 4

                        D = 7
                        E = 0
                        F = 1

                        D = 1
                        G = 8


blah blah

blah blah

---    ENDBLOCK

It shows data in blocks. Each block begins with A= some value and the block ends with a line saying --- ENDBLOCK. Each block has sub block and each sub block begins after a blank line.

For example, for first block the first sub block A=1, B=5, C=3, second sub block is D=0, E=0 and 3rd sub block is D=4, E=1, G=1.

My goal is kind of complex and is to get a list showing the values for A to G for each sub block. If there is no value, show empty. Desired output below.

[
    [
        [1,5,3,,,,],[,,,0,0,,],[,,,4,1,,1]
    ],
    [
        [3,2,8,6,9,3,]
    ],
    [
        [7,2,2,9,0,,],[,,,1,4,,],[,,,7,0,1,],[,,,1,,,8]
    ]
]

My current code is reading the file and storing the data at some level but still I'm far from the desired output. Thanks for any help.

f=open("file.txt","r").read().split("ENDBLOCK")
rows = []
for line in f:
    for subline in line.split("\n\n"):
        if " = " in subline and not "RETCODE" in subline:
            rows.append(subline.replace(" ","").split())

>>> rows
[
    ['A=1', 'B=5', 'C=3'], ['D=0', 'E=0'], ['D=4', 'E=1', 'G=1'], 
    ['D=1', 'E=4'], ['D=7', 'E=0', 'F=1'], ['D=1', 'G=8']
]

CodePudding user response：

You can do something like this:

import re

BLOCK_END = "---    ENDBLOCK"
LETTER_TO_VALUE_REGEX = re.compile(r"\s (?P<letter>[A-G]) = (?P<number>\d)*")
LETTERS = ["A", "B", "C", "D", "E", "F", "G"]
SUB_BLOCK_END = "\n\n"


top_list = []
for block in text.split(BLOCK_END):
    if not block:
        continue
    sub_list = []
    for sub_block in block.split(SUB_BLOCK_END):
        if not sub_block:
            continue
        letters_to_numbers_dict = {}
        for line in sub_block.split("\n"):
            match = LETTER_TO_VALUE_REGEX.match(line)
            if match:
                letters_to_numbers_dict[match["letter"]] = int(match["number"])
        inner_list = []
        for letter in LETTERS:
            inner_list.append(letters_to_numbers_dict.get(letter, ""))
        if set(inner_list) not in [{""}]:
                sub_list.append(inner_list) 
    top_list.append(sub_list)

and then "top_list" is your desired list

CodePudding user response：

check out this solution

from copy import deepcopy
f = open("file.txt", "r").readlines()

rows = []
second_row = []
third_row = []
for line in f:
    if 'RETCODE' not in line and '=' in line:
        number = [ int(x) for x in line.split('=')[1] if x.isnumeric()][0]
        third_row.append(number)
    elif line == "\n":
        third_copy = deepcopy(third_row)
        if third_copy:
            second_row.append(third_copy)
            third_row = []
    elif 'ENDBLOCK' in line:
        second_copy = deepcopy(second_row)
        if second_copy:
            rows.append(second_copy)
            second_row = []

print(rows)