I have a file that was written in a JSON structure but is not correctly formatted. The content looks similar to this:
[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2:"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]
Unlike many questioned asked here before, the contents are all on the same line, so I was trying to read the code line by line but I select the whole thing if I use readline().
I am trying to extract only the information between the curly brackets { } with the brackets, and print them. I am able to open the file, but I am finding it difficult to find a way to read starting from the { and ending at } then continue to look for the next { and } and so on. I don't really care about the square brackets, just the curly brackets. Also, the values can differ in length so I can set a number of characters to be read after the bracket, as it is different for each set of brackets most of the time.
Any guidance would be greatly appreciated.
CodePudding user response:
import re
fileContent = "[{'key0':'value0' , 'key1':'value1', 'key2':'value2'}, {'key0':'value3', 'key1':'value4', 'key2':'value5'}, {'key0':'value6', 'key1':'value7', 'key2':'value8'}]"
pattern_with_braces = r'\{.*?\}'
pattern_without_braces = r'(?<=\{).*?(?=\})'
parts = re.findall(pattern_without_braces, fileContent)
CodePudding user response:
I suggest you use the regex module in order to modify the lines and then transform them into a dictionary:
import re
import json
with open("data.txt") as f:
lines = f.readlines()
for line in lines:
modified = re.sub(r"({|\s)\"(\w ):", r'\1"\2":', line)
dictionary = json.loads(modified)
print(dictionary)
In your example, running the code above would result in something like:
[{'key0': 'value0', 'key1': 'value1', 'key2': 'value2'}, {'key0': 'value3', 'key1': 'value4', 'key2': 'value5'}, {'key0': 'value6', 'key1': 'value7', 'key2': 'value8'}]
Moreover, you will have access to the keys and values of this dictionary.
Note that the "data.txt" file in the code above is as follows:
[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2:"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]
CodePudding user response:
Try using json.loads
method from Python
json encoder module that Deserialize fp (a .read()-supporting text file or binary file containing a JSON
document) to a Python object using this conversion table.
To decode your json string:
import json
str_to_load = '[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2":"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]'
str_to_load = json.loads(str_to_load)
print(str_to_load[2]['key2'])
output:
value8