I tried to find information online but could not find any regarding what I want to do.
I have a text document, within is a type of table that I want to convert to a dictionary. The table looks like this:
Test Name Cycles Operations Result Errors Last Error
Network: eno1 (172.0.10.1) 9289 81.751 Million PASS 0 No errors
Network: eno2 (172.0.10.1) 9289 81.750 Million PASS 0 No errors
Network: eno3 (172.0.10.1) 9362 82.387 Million PASS 0 No errors
Network: eno4 (172.0.10.1) 9411 82.818 Million PASS 0 No errors
USB 2: PMOKZQ35 (1:10) 5 58.328 Million PASS 0 No errors
USB 3: PMU34QG452 (2:1) 2 2.690 Billion PASS 0 No errors
USB 3: PMU356Q2K0 (2:3) 2 2.403 Billion PASS 0 No errors
Serial Port: ttyS1 4 224200 PASS 0 No errors
I want to create a dictionary that when I call a specific columns I get the information, for example:
ability_test['Test Name'][0]
# return Network: eno1 (172.0.10.1)
ability_test['Cycles'][1]
# return 9289
So far I have only been able to turn the information into a dictionary but have not been able to split the information.
My code
ability_test = {}
with open(f"result.txt", "r") as f:
for line in f:
count = 2
try:
k, v = line.strip().split(":")
if k in ability_test.keys():
ability_test[k f"{count}".strip()] = v.strip()
count = count 1
else:
ability_test[k.strip()] = v.strip()
except:
pass
I would appreciate any information or suggestion on how to do this
CodePudding user response:
This is quite simple with pandas and its read_fwf method (read fixed with file). By default it infers the fixed columns widths and gets it right in this case. There are optional parameters to guide the function if not.
import pandas as pd
df = pd.read_fwf('result.txt')
print(df)
print(df['Test Name'][0])
print(df['Cycles'][1])
Output:
Test Name Cycles Operations Result Errors Last Error
0 Network: eno1 (172.0.10.1) 9289 81.751 Million PASS 0 No errors
1 Network: eno2 (172.0.10.1) 9289 81.750 Million PASS 0 No errors
2 Network: eno3 (172.0.10.1) 9362 82.387 Million PASS 0 No errors
3 Network: eno4 (172.0.10.1) 9411 82.818 Million PASS 0 No errors
4 USB 2: PMOKZQ35 (1:10) 5 58.328 Million PASS 0 No errors
5 USB 3: PMU34QG452 (2:1) 2 2.690 Billion PASS 0 No errors
6 USB 3: PMU356Q2K0 (2:3) 2 2.403 Billion PASS 0 No errors
7 Serial Port: ttyS1 4 224200 PASS 0 No errors
Network: eno1 (172.0.10.1)
9289
CodePudding user response:
A number of issues.
First, you need to capture the header names into their own list such that you can keep track of them.
Secondly, the data appears to be splitable based on the presence of multiple whitespace characters. You can use a regex for this: re.compile(r"\s\s ")
e.g.
import re
splitter = re.compile(r"\s\s ")
ability_test = {}
with open(f"result.txt", "r") as f:
# Use `next` to pop off the first line of headers
headers = splitter.split(next(f).strip())
for header in headers:
ability_test[header] = []
for line in f:
# For each value, associate it with the proper list of headers
values = splitter.split(line.strip())
for header, value in zip(headers, values):
ability_test[header].append(value)
for header, values in ability_test.items():
print(header, values)
Outputs:
Test Name ['Network: eno1 (172.0.10.1)', 'Network: eno2 (172.0.10.1)', 'Network: eno3 (172.0.10.1)', 'Network: eno4 (172.0.10.1)', 'USB 2: PMOKZQ35 (1:10)', 'USB 3: PMU34QG452 (2:1)', 'USB 3: PMU356Q2K0 (2:3)', 'Serial Port: ttyS1']
Cycles ['9289', '9289', '9362', '9411', '5', '2', '2', '4']
Operations ['81.751 Million', '81.750 Million', '82.387 Million', '82.818 Million', '58.328 Million', '2.690 Billion', '2.403 Billion', '224200']
Result ['PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS']
Errors ['0', '0', '0', '0', '0', '0', '0', '0']
Last Error ['No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors']
This data is still a little difficult to work with. I think a better pattern might be to output one dict per line. You could do so like this:
import re
splitter = re.compile(r"\s\s ")
ability_test = []
with open(f"result.txt", "r") as f:
headers = splitter.split(next(f).strip())
for line in f:
values = splitter.split(line.strip())
ability_test.append(dict(zip(headers, values)))
for item in ability_test:
print(item)
{'Test Name': 'Network: eno1 (172.0.10.1)', 'Cycles': '9289', 'Operations': '81.751 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno2 (172.0.10.1)', 'Cycles': '9289', 'Operations': '81.750 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno3 (172.0.10.1)', 'Cycles': '9362', 'Operations': '82.387 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno4 (172.0.10.1)', 'Cycles': '9411', 'Operations': '82.818 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 2: PMOKZQ35 (1:10)', 'Cycles': '5', 'Operations': '58.328 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 3: PMU34QG452 (2:1)', 'Cycles': '2', 'Operations': '2.690 Billion', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 3: PMU356Q2K0 (2:3)', 'Cycles': '2', 'Operations': '2.403 Billion', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Serial Port: ttyS1', 'Cycles': '4', 'Operations': '224200', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
CodePudding user response:
First you should make sure each line of the input file is splitted by the same token. The code below assumes that the file is splited by '\t':
import collections
spliter = '\t'
with open("./result.txt", "r") as f:
ability_test = collections.OrderedDict([(key, []) for key in f.readline().strip().split(spliter)])
print(ability_test.keys())
for line in f:
for l, v in zip(ability_test.values(), line.strip().split(spliter)):
l.append(v)
print(ability_test)
print(ability_test['Test Name'][0])
print(ability_test['Cycles'][1])