Home > Mobile >  Having difficulty extracting test titles and test results from a list of text
Having difficulty extracting test titles and test results from a list of text

Time:04-03

I've got a list of lines of text that are comprised of alternating section-headers and section-content. I want to parse it line by line, and identify the sections and their associated content (to eventually throw together into a dictionary).

The trouble I am having is in figuring out how to parse the lines into pairs based only on iterating through the list and looking for the headers. Everytime I try I get very close, but somehow my sections end up misaligned.

I think my algorithm should be as follows:

(0) Assume no header has been identified at the beginning of the search; hence, any content seen will be ignored until a section header is encountered.

(1) When "in" a section (i.e. a section header has been encountered), accumulate all following section content and append it together, until such a time as a new section header is seen.

(2) Upon encountering the new section header, any following lines should be considered as part of the new section.

(3) Some sections may only have a header, and hence have blank content. Others may span a single or multiple lines.

In other words, given this:

garbage
Section-A-Header
section A content line 1
section A content line 2
section A content line 3
Section-B-Header
section B content line 1
section B content line 2
Section-C-Header
Section-D-Header
section D content line 1
section D content line 2
section D content line 3

...I would like to be able to construct:

{Section-A-Header: section A content line 1   section A content line 2   section A content line 3}
{Section-B-Header: section B content line 1   section B content line 2}
{Section-C-Header: None}
{Section-D-Header: section D content line 1   section D content line 2   section D content line 3}

Could anyone help me figure out a solid implementation?

CodePudding user response:

I am not sure what is the exact issue you are facing with this.

Here is a pseudocode for you to take inspiration from


file = open("sections.txt", 'r')

last_header=''
output = {}
for line in file.readlines():
    if is_section_header(line):
        last_header = line
        output[line] = ""
    else:
        existing_data = output[last_header]
        output[last_header] = existing_data   line

print(output)


def is_section_header(line):
    #some logic to identify header
    return True

CodePudding user response:

This would be my approach:

result = dict()

with open('foo.txt') as foo:
    section = None
    for line in map(str.strip, foo):
        # identify start of section
        if line.startswith('Section-'):
            section = line
            result[section] = None
        else:
            if section:
                if result[section]:
                    result[section].append(line)
                else:
                    result[section] = [line]

Result:

{
  "Section-A-Header": [
    "section A content line 1",
    "section A content line 2",
    "section A content line 3"
  ],
  "Section-B-Header": [
    "section B content line 1",
    "section B content line 2"
  ],
  "Section-C-Header": None,
  "Section-D-Header": [
    "section D content line 1",
    "section D content line 2",
    "section D content line 3"
  ]
}

Note:

Written like this only because OP wants None for empty sections

  • Related