I have a .txt file that is structured somewhat like a JSON dictionary tree. I intend on converting it to JSON using python. I believe that to do this, I first need to convert the JSON file into a python dictionary, which I am having trouble with. The structure of the .txt file is shown below. This is not the entire file, and it may continue for awhile like this with nested dictionaries.
outer:
middle:
identity: data1
types: data2
name: data3
region: data4
motion: data5
geometry_motion: data6
roughness:
height: data7
constant: data8
velocity:
types: data9
value: data10
The output should eventually be JSON, but I'm more concerned to getting it to a python dict that looks something like this.
{'outer': {'middle': {'identity': data1, 'names': data2, etc.}}}
My attempts to solve this so far have involved using the readlines()
method to convert the file to a list of its lines, and splitting the lines by the colon using line.split(':')
. The following code shows this.
with open(datafile) as file:
lines = file.readlines()
lines = [line.strip().split(':', 1) for line in lines]
output:
[['outer:', ''], ['middle:', ''], ['identity:', 'data1'], etc.]
I then tried to iterate over the lines, and if the second element in the line was '', then the first element in that line would become the key for a new dict containing the rest of the items of a further indentation. Here is where I have gotten quite stuck. I have toyed with the idea of using a recursive function that calls itself every time a new nested dict must be made, but I haven't gotten anywhere with that. Here is an attempt at some code which does not work for a number of reasons but may give some insight on my thought process.
data_dict = {}
i = 0
def recurse(i):
try:
elements = lines[i]
except IndexError: # return the dict once the list runs out of elements
return data_dict
if elements[1] == '':
i = 1
data_dict[[elements[0]]] = recurse(i)
else: # if there is a second element in the list, make those key-value pairs in data_dict
k, v = [element.strip() for element in elements]
data_dict[k] = v
i = 1
recurse(i)
Please feel free to provide any advice or suggestions that would send me in the right direction.
This is my first question on Stack OverFlow and I understand that there's a chance I could have left out some valuable information. Please let me know if there's anything else I can do/provide to help solve this problem.
CodePudding user response:
This text is valid YAML. You can use the yaml
package to read it from a file or parse the string, and get a Python dictionary. After that, you can use the json
module to serialize the dictionary into JSON.
import yaml
import json
with open('test.yml', 'r') as yaml_file:
doc=yaml.load(yaml_file,Loader=yaml.FullLoader)
print(doc)
----------
{'outer': {'middle': {'identity': 'data1', 'types': 'data2', 'name': 'data3', 'region': 'data4', 'motion': 'data5', 'geometry_motion': 'data6', 'roughness': {'height': 'data7', 'constant': 'data8'}, 'velocity': {'types': 'data9', 'value': 'data10'}}}}
That dictionary can be written as JSON with json.dump
:
with open('test.json', 'w') as json_file:
json.dump(doc,json_file)
CodePudding user response:
The answer by Panagiotis Kanavos ("use the Python's yaml
package") is probably the best.
Nevertheless, it might be instructive to try solving it without yaml
.
I think one key problem in your approach is that you ignore the indentation, which means that
a:
b: x
c: y
results in the same list lines
as
a:
b: x
c: y
even though they should have the same tree structure.
Another problem is that you do not tell the recursive call, what dictionary the new values should be put into.
I tried to build a solution similar to your attempt. Here it is:
lines = []
with open('data.txt', 'r') as fp:
for line in fp:
left, right = line.split(':')
indentation = len(left) - len(left.lstrip())
lines.append((indentation, left.strip(), right.strip()))
def fill_dictionary(dictionary, i, previous_indentation):
j = i
while j < len(lines):
indentation, key, val = lines[j]
if indentation <= previous_indentation:
return j # go one level up
elif not val: # go one level deeper
dictionary[key] = {}
j = fill_dictionary(dictionary[key], j 1, indentation)
else: # just enter the value
dictionary[key] = val
j = 1
return j
result = {}
fill_dictionary(result, 0, -1)
print(result)