Home > Back-end >  How to split on only some characters within a string
How to split on only some characters within a string

Time:09-29

I need to convert a line of SQL into a dictionary. Given the following partial line:

'field_1="value_1", field_2=1234'

I what to create the following dictionary:

{"field_1": "value_1", "field_2": 1234}

Using:

text = text.replace(",", "=")
text = text.split("=")
text = [i.strip() for i in text]
col_names = text[::2]
values = text[1::2]
dict_ = dict(zip(col_names, values))

I get as close as:

{'field_1': '"value_1"', 'field_2': '1234'}

I'm pretty sure I'll be able to sort out the extra quotation marks however I'm struggling with this line because of the comma inside the quotation marks wrecks my split():

'field_1="value_1a, value_1b", field_2=1234'

I have a feeling I might be able to use regex here to solve bothh my quotation mark and extra comma issues but I can't work out the exact syntax. The values can sometimes be strings and sometimes integers/floats. The field names can vary considerably and there aren't always spaces after commas. There also isn't a comma at the end of the string.

Thanks in advance!

CodePudding user response:

Here is a very simple approach, that would do the job for you:

def parse(s):
    buf = []
    token = ""
    quotes = False
    punct = {",", "="}
    for c in s:
        if c == '"':
            quotes = not quotes
        elif c in punct and not quotes:
            buf.append(token)
            token = ""
        elif c != " " or quotes:
            token  = c
    buf.append(token)
    return dict(zip(buf[::2], buf[1::2]))

print(parse('field_1="value_1a, value_1b", field_2=1234'))
# {'field_1': 'value_1a, value_1b', 'field_2': '1234'}

It should be handling all simple cases (including commas inside the values), however, you'll need to improve it to work with different types of quotes, quotes inside the quotes, etc.

I hope this would give you the right direction.

CodePudding user response:

As Grzegorz Oledzki said, you'll want to split by your commas and then "deal with" (strip) your quotations afterwards

x = 'field_1="value_1", field_2=1234'

x = {i.split("=")[0]:i.split("=")[1].strip("\"") for i in x.split(",")}
{'field_1': 'value_1', ' field_2': '1234'}

Note that this doesnt evaluate integers so we can expand on this so it can accomplish that:

import ast

x = 'field_1="value_1", field_2=1234'

x = {i.split("=")[0]:i.split("=")[1].strip("\"") if i.split("=")[1].count("\"") else ast.literal_eval(i.split("=")[1].strip("\"")) for i in x.split(",")}
{'field_1': 'value_1', ' field_2': 1234}
  • Related