Home > database >  How to convert string to valid json or yaml
How to convert string to valid json or yaml

Time:01-25

I have a large script that parses js with a dataframe entry, but to shorten the question, I put what I need in a separate variable. My variable contains the following value

value = "{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3}"

I apply the following script and get data like this

value = "{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3}"

def parse_json(value):
    arr = value.split("},")
    arr = [x "}" for x in arr]
    arr[-1] = arr[-1][:-1]
    
    return json.dumps({str(i):add_quotation_marks(x) for i, x in enumerate(arr)})

def add_quotation_marks(value):
    words = re.findall(r'(\w :)', value)
    for word in words:
        value = value.replace(word[:-1], f'"{word[:-1]}"')
    return json.loads(value)

print(parse_json(value))
{"0": {"from": [3, 4], "to": [7, 4], "color": 2}, "1": {"from": [3, 6], "to": [10, 6], "color": 3}}

The script executes correctly, but I need to get a slightly different result. This is what the result I want to get looks like:

{
  "0": {
    "from": {
      "0": "3",
      "1": "4"
    },
    "to": {
      "0": "7",
      "1": "4"
    },
    "color": "2"
  },
  "1": {
    "from": {
      "0": "3",
      "1": "6"
    },
    "to": {
      "0": "10",
      "1": "6"
    },
    "color": "3"
  }
}

This is valid json and valid yaml. Please tell me how can I do this

CodePudding user response:

I'd suggest a regex approach in this case:

res = []

# iterates over each "{from:...,to:...,color:...}" group separately
for obj in re.findall(r'\{([^}] )}', value):
    item = {}

    # iterates over each "...:..." key-value separately
    for k, v in re.findall(r'(\w ):(\[[^]] ]|\d )', obj):
        if v.startswith('['):
            v = v.strip('[]').split(',')

        item[k] = v

    res.append(item)

This produces this output in res:

[{'from': ['3', '4'], 'to': ['7', '4'], 'color': '2'}, {'from': ['3', '6'], 'to': ['10', '6'], 'color': '3'}]

Since your values can contain commas, trying to split on commas or other markers is fairly tricky, and using these regexes to match your desired values instead is more stable.

CodePudding user response:

Here's the code that converts the the value to your desired output.

import json5  # pip install json5

value = "{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3}"

def convert(str_value):
    str_value = f"[{str_value}]"  # added [] to make it a valid json
    parsed_value = json5.loads(str_value)  # convert to python object
    output = {}  # create empty dict

    # Loop through the list of dicts. For each item, create a new dict
    # with the index as the key and the value as the value. If the value
    # is a list, convert it to a dict with the index as the key and the
    # value as the value. If the value is not a list, just add it to the dict.
    for i, d in enumerate(parsed_value):
        output[i] = {}
        for k, v in d.items():
            output[i][k] = {j: v[j] for j in range(len(v))} if isinstance(v, list) else v

    return output

print(json5.dumps(convert(value)))

Output

{
  "0": {     
    "from": {
      "1": 4
    },
    "to": {
      "0": 7,
      "1": 4
    },
    "color": 2
  },
  "1": {
    "from": {
      "0": 3,
      "1": 6
    },
    "to": {
      "0": 10,
      "1": 6
    },
    "color": 3
  }
}
  • json5 package allows you to convert a javascrip object to a python dictionary so you dont have to do split("},{").
  • Then added [ and ] to make the string a valid json.
  • Then load the string using json5.loads()
  • Now you can loop through the dictionary and convert it to desired output format.
  • Related