Python - Convert a table from text file to dictionary-CodePudding

I tried to find information online but could not find any regarding what I want to do.

I have a text document, within is a type of table that I want to convert to a dictionary. The table looks like this:

                  Test Name   Cycles   Operations      Result  Errors   Last Error
 Network: eno1 (172.0.10.1)   9289     81.751 Million  PASS    0        No errors
 Network: eno2 (172.0.10.1)   9289     81.750 Million  PASS    0        No errors
 Network: eno3 (172.0.10.1)   9362     82.387 Million  PASS    0        No errors
 Network: eno4 (172.0.10.1)   9411     82.818 Million  PASS    0        No errors
     USB 2: PMOKZQ35 (1:10)   5        58.328 Million  PASS    0        No errors
    USB 3: PMU34QG452 (2:1)   2        2.690 Billion   PASS    0        No errors
    USB 3: PMU356Q2K0 (2:3)   2        2.403 Billion   PASS    0        No errors
         Serial Port: ttyS1   4        224200          PASS    0        No errors

I want to create a dictionary that when I call a specific columns I get the information, for example:

ability_test['Test Name'][0]
# return Network: eno1 (172.0.10.1)
ability_test['Cycles'][1]
# return 9289

So far I have only been able to turn the information into a dictionary but have not been able to split the information.

My code

ability_test = {}
with open(f"result.txt", "r") as f:
    for line in f:
        count = 2
        try:
            k, v = line.strip().split(":")
            if k in ability_test.keys():
                ability_test[k   f"{count}".strip()] = v.strip()
                count = count   1
            else:
                ability_test[k.strip()] = v.strip()
        except:
            pass

I would appreciate any information or suggestion on how to do this

CodePudding user response：

This is quite simple with pandas and its read_fwf method (read fixed with file). By default it infers the fixed columns widths and gets it right in this case. There are optional parameters to guide the function if not.

import pandas as pd

df = pd.read_fwf('result.txt')
print(df)
print(df['Test Name'][0])
print(df['Cycles'][1])

Output:

                    Test Name  Cycles      Operations Result  Errors Last Error
0  Network: eno1 (172.0.10.1)    9289  81.751 Million   PASS       0  No errors
1  Network: eno2 (172.0.10.1)    9289  81.750 Million   PASS       0  No errors
2  Network: eno3 (172.0.10.1)    9362  82.387 Million   PASS       0  No errors
3  Network: eno4 (172.0.10.1)    9411  82.818 Million   PASS       0  No errors
4      USB 2: PMOKZQ35 (1:10)       5  58.328 Million   PASS       0  No errors
5     USB 3: PMU34QG452 (2:1)       2   2.690 Billion   PASS       0  No errors
6     USB 3: PMU356Q2K0 (2:3)       2   2.403 Billion   PASS       0  No errors
7          Serial Port: ttyS1       4          224200   PASS       0  No errors
Network: eno1 (172.0.10.1)
9289

CodePudding user response：

A number of issues.

First, you need to capture the header names into their own list such that you can keep track of them.

Secondly, the data appears to be splitable based on the presence of multiple whitespace characters. You can use a regex for this: re.compile(r"\s\s ")

e.g.

import re

splitter = re.compile(r"\s\s ")

ability_test = {}
with open(f"result.txt", "r") as f:
    # Use `next` to pop off the first line of headers
    headers = splitter.split(next(f).strip())
    for header in headers:
        ability_test[header] = []
    for line in f:
        # For each value, associate it with the proper list of headers
        values = splitter.split(line.strip())
        for header, value in zip(headers, values):
            ability_test[header].append(value)

for header, values in ability_test.items():
    print(header, values)

Outputs:

Test Name ['Network: eno1 (172.0.10.1)', 'Network: eno2 (172.0.10.1)', 'Network: eno3 (172.0.10.1)', 'Network: eno4 (172.0.10.1)', 'USB 2: PMOKZQ35 (1:10)', 'USB 3: PMU34QG452 (2:1)', 'USB 3: PMU356Q2K0 (2:3)', 'Serial Port: ttyS1']
Cycles ['9289', '9289', '9362', '9411', '5', '2', '2', '4']
Operations ['81.751 Million', '81.750 Million', '82.387 Million', '82.818 Million', '58.328 Million', '2.690 Billion', '2.403 Billion', '224200']
Result ['PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS']
Errors ['0', '0', '0', '0', '0', '0', '0', '0']
Last Error ['No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors']

This data is still a little difficult to work with. I think a better pattern might be to output one dict per line. You could do so like this:

import re

splitter = re.compile(r"\s\s ")

ability_test = []
with open(f"result.txt", "r") as f:
    headers = splitter.split(next(f).strip())
    for line in f:
        values = splitter.split(line.strip())
        ability_test.append(dict(zip(headers, values)))

for item in ability_test:
    print(item)

{'Test Name': 'Network: eno1 (172.0.10.1)', 'Cycles': '9289', 'Operations': '81.751 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno2 (172.0.10.1)', 'Cycles': '9289', 'Operations': '81.750 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno3 (172.0.10.1)', 'Cycles': '9362', 'Operations': '82.387 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno4 (172.0.10.1)', 'Cycles': '9411', 'Operations': '82.818 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 2: PMOKZQ35 (1:10)', 'Cycles': '5', 'Operations': '58.328 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 3: PMU34QG452 (2:1)', 'Cycles': '2', 'Operations': '2.690 Billion', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 3: PMU356Q2K0 (2:3)', 'Cycles': '2', 'Operations': '2.403 Billion', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Serial Port: ttyS1', 'Cycles': '4', 'Operations': '224200', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}

CodePudding user response：

First you should make sure each line of the input file is splitted by the same token. The code below assumes that the file is splited by '\t':

import collections
spliter = '\t'
with open("./result.txt", "r") as f:
    ability_test = collections.OrderedDict([(key, []) for key in f.readline().strip().split(spliter)])
    print(ability_test.keys())
    for line in f:
        for l, v in zip(ability_test.values(), line.strip().split(spliter)):
            l.append(v)
print(ability_test)
print(ability_test['Test Name'][0])
print(ability_test['Cycles'][1])