I've got a long string. This string contains a list, like such example
'[{"ex1": 0, "ex2":1}, {"ex3": 2, "ex4":3}]'
I can use json5.loads
and then get the first element by using [0]
on the list, but json5.loads
takes a long time for longer strings. Is there a way to get just the first element without loading the entire list? (in this example it would be {"ex1": 0, "ex2":1}
. Splitting by commas doesn't work for me since there are commas contained in dictionaries in the list. Thanks.
CodePudding user response:
If it'll definitely be that format, you can just search for the beginning and ending brackets.
mystr = '[{"ex1": 0, "ex2":1}, {"ex3": 2, "ex4":3}]'
first = mystr.index("{")
last = mystr.index("}")
extracted = mystr[first:last 1]
print(extracted)
this prints '{"ex1": 0, "ex2":1}'
For a more complicated string:
mystr = '[{"ex1": {"ex1.33": -1, "ex1.66": -2}, "ex2":1}, {"ex3": 2, "ex4":3}]'
n_open = 0
n_close = 0
first = mystr.index("{")
for ii in range(len(mystr)):
if mystr[ii] == "{":
n_open = 1
elif mystr[ii] == "}":
n_close = 1
if n_open > 0 and n_open == n_close:
break
extracted = mystr[first:ii 1]
CodePudding user response:
Does your string work with ast.literal_eval()
? If it does, you could do
obj = ast.literal_eval(s)
# obj[0] gives the first dict
If not, you could loop through the string character-by-character and yield any substring when the number of open-brackets are equal to the number of close-brackets.
def get_top_level_dict_str(s):
open_br = 0
close_br = 0
open_index = 0
for i, c in enumerate(s):
if c == '{':
if open_br == 0: open_index = i
open_br = 1
elif c == '}':
close_br = 1
if open_br > 0 and open_br == close_br:
yield s[open_index:i 1]
open_br = close_br = 0
If you want to parse the resulting substrings to objects, you could use json5
like you already do, which is probably faster on the smaller string, or use ast.literal_eval()
x = get_top_level_dict_str(s)
# next(x) gives the substring
# then use json5 or ast.literal_eval()