Home > database >  Python - Convert JSON-Formatted .txt file to python dictionary
Python - Convert JSON-Formatted .txt file to python dictionary

Time:07-20

I have a .txt file that is structured somewhat like a JSON dictionary tree. I intend on converting it to JSON using python. I believe that to do this, I first need to convert the JSON file into a python dictionary, which I am having trouble with. The structure of the .txt file is shown below. This is not the entire file, and it may continue for awhile like this with nested dictionaries.

 outer:
     middle:
          identity:        data1                                           
          types:           data2
          name:            data3
          region:          data4
          motion:          data5
          geometry_motion: data6
          roughness:
             height:       data7
             constant:     data8                                         
          velocity:
             types:        data9
             value:        data10

The output should eventually be JSON, but I'm more concerned to getting it to a python dict that looks something like this.

{'outer': {'middle': {'identity': data1, 'names': data2, etc.}}}

My attempts to solve this so far have involved using the readlines() method to convert the file to a list of its lines, and splitting the lines by the colon using line.split(':'). The following code shows this.

with open(datafile) as file:
    lines = file.readlines()
    lines = [line.strip().split(':', 1) for line in lines]

output:
[['outer:', ''], ['middle:', ''], ['identity:', 'data1'], etc.]

I then tried to iterate over the lines, and if the second element in the line was '', then the first element in that line would become the key for a new dict containing the rest of the items of a further indentation. Here is where I have gotten quite stuck. I have toyed with the idea of using a recursive function that calls itself every time a new nested dict must be made, but I haven't gotten anywhere with that. Here is an attempt at some code which does not work for a number of reasons but may give some insight on my thought process.

data_dict = {}
i = 0
def recurse(i):
    try: 
        elements = lines[i]
    except IndexError: # return the dict once the list runs out of elements
        return data_dict
    if elements[1] == '':
        i  = 1
        data_dict[[elements[0]]] = recurse(i)
    else: # if there is a second element in the list, make those key-value pairs in data_dict
        k, v = [element.strip() for element in elements]
        data_dict[k] = v  
        i  = 1
        recurse(i)

Please feel free to provide any advice or suggestions that would send me in the right direction.

This is my first question on Stack OverFlow and I understand that there's a chance I could have left out some valuable information. Please let me know if there's anything else I can do/provide to help solve this problem.

CodePudding user response:

This text is valid YAML. You can use the yaml package to read it from a file or parse the string, and get a Python dictionary. After that, you can use the json module to serialize the dictionary into JSON.

import yaml
import json

with open('test.yml', 'r') as yaml_file:
    doc=yaml.load(yaml_file,Loader=yaml.FullLoader)
print(doc)
----------
{'outer': {'middle': {'identity': 'data1', 'types': 'data2', 'name': 'data3', 'region': 'data4', 'motion': 'data5', 'geometry_motion': 'data6', 'roughness': {'height': 'data7', 'constant': 'data8'}, 'velocity': {'types': 'data9', 'value': 'data10'}}}}

That dictionary can be written as JSON with json.dump :

with open('test.json', 'w') as json_file:
    json.dump(doc,json_file)

CodePudding user response:

The answer by Panagiotis Kanavos ("use the Python's yaml package") is probably the best. Nevertheless, it might be instructive to try solving it without yaml.

I think one key problem in your approach is that you ignore the indentation, which means that

a:
  b: x
c: y

results in the same list lines as

a:
  b: x
  c: y

even though they should have the same tree structure.

Another problem is that you do not tell the recursive call, what dictionary the new values should be put into.

I tried to build a solution similar to your attempt. Here it is:

lines = []
with open('data.txt', 'r') as fp:
    for line in fp:
        left, right = line.split(':')
        indentation = len(left) - len(left.lstrip())
        lines.append((indentation, left.strip(), right.strip()))

def fill_dictionary(dictionary, i, previous_indentation):
    j = i
    while j < len(lines):
        indentation, key, val = lines[j]
        if indentation <= previous_indentation:
            return j   # go one level up
        elif not val:  # go one level deeper
            dictionary[key] = {}
            j = fill_dictionary(dictionary[key], j 1, indentation)
        else:          # just enter the value
            dictionary[key] = val
            j  = 1
    return j

result = {}
fill_dictionary(result, 0, -1)
print(result)
  • Related