Home > front end >  Extracting information between two curly brackets in a file with Python
Extracting information between two curly brackets in a file with Python

Time:03-07

I have a file that was written in a JSON structure but is not correctly formatted. The content looks similar to this:

[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2:"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]

Unlike many questioned asked here before, the contents are all on the same line, so I was trying to read the code line by line but I select the whole thing if I use readline().

I am trying to extract only the information between the curly brackets { } with the brackets, and print them. I am able to open the file, but I am finding it difficult to find a way to read starting from the { and ending at } then continue to look for the next { and } and so on. I don't really care about the square brackets, just the curly brackets. Also, the values can differ in length so I can set a number of characters to be read after the bracket, as it is different for each set of brackets most of the time.

Any guidance would be greatly appreciated.

CodePudding user response:

import re

fileContent = "[{'key0':'value0' , 'key1':'value1', 'key2':'value2'}, {'key0':'value3', 'key1':'value4', 'key2':'value5'}, {'key0':'value6', 'key1':'value7', 'key2':'value8'}]"

pattern_with_braces = r'\{.*?\}'
pattern_without_braces = r'(?<=\{).*?(?=\})'
parts = re.findall(pattern_without_braces, fileContent)

CodePudding user response:

I suggest you use the regex module in order to modify the lines and then transform them into a dictionary:

import re
import json
with open("data.txt") as f:
  lines = f.readlines()
  for line in lines:
    modified = re.sub(r"({|\s)\"(\w ):", r'\1"\2":', line)
    dictionary = json.loads(modified)
    print(dictionary)

In your example, running the code above would result in something like:

[{'key0': 'value0', 'key1': 'value1', 'key2': 'value2'}, {'key0': 'value3', 'key1': 'value4', 'key2': 'value5'}, {'key0': 'value6', 'key1': 'value7', 'key2': 'value8'}]

Moreover, you will have access to the keys and values of this dictionary.

Note that the "data.txt" file in the code above is as follows:

[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2:"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]

CodePudding user response:

Try using json.loads method from Python json encoder module that Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

To decode your json string:

import json

str_to_load = '[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2":"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]'
str_to_load = json.loads(str_to_load)

print(str_to_load[2]['key2'])

output: value8

  • Related