How to get only first element in list contained in string?-CodePudding

I've got a long string. This string contains a list, like such example

'[{"ex1": 0, "ex2":1}, {"ex3": 2, "ex4":3}]'

I can use json5.loads and then get the first element by using [0] on the list, but json5.loads takes a long time for longer strings. Is there a way to get just the first element without loading the entire list? (in this example it would be {"ex1": 0, "ex2":1}. Splitting by commas doesn't work for me since there are commas contained in dictionaries in the list. Thanks.

CodePudding user response：

If it'll definitely be that format, you can just search for the beginning and ending brackets.

mystr = '[{"ex1": 0, "ex2":1}, {"ex3": 2, "ex4":3}]'
first = mystr.index("{")
last = mystr.index("}")
extracted = mystr[first:last 1]
print(extracted)

this prints '{"ex1": 0, "ex2":1}'

For a more complicated string:

mystr = '[{"ex1": {"ex1.33": -1, "ex1.66": -2}, "ex2":1}, {"ex3": 2, "ex4":3}]'
n_open = 0
n_close = 0
first = mystr.index("{")
for ii in range(len(mystr)):
    if mystr[ii] == "{":
        n_open  = 1
    elif mystr[ii] == "}":
        n_close  = 1
    if n_open > 0 and n_open == n_close:
        break
extracted = mystr[first:ii 1]

CodePudding user response：

Does your string work with ast.literal_eval()? If it does, you could do

obj = ast.literal_eval(s)
# obj[0] gives the first dict

If not, you could loop through the string character-by-character and yield any substring when the number of open-brackets are equal to the number of close-brackets.

def get_top_level_dict_str(s):
  open_br = 0
  close_br = 0
  open_index = 0
  for i, c in enumerate(s):
    if c == '{':
        if open_br == 0: open_index = i 
        open_br  = 1
    elif c == '}':
        close_br  = 1
        if open_br > 0 and open_br == close_br:
            yield s[open_index:i 1]
            open_br = close_br = 0

If you want to parse the resulting substrings to objects, you could use json5 like you already do, which is probably faster on the smaller string, or use ast.literal_eval()

x = get_top_level_dict_str(s)
# next(x) gives the substring
# then use json5 or ast.literal_eval()