Home > Software design >  General function to turn string into **kwargs
General function to turn string into **kwargs

Time:01-26

I'm trying to find a way to pass a string (coming from outside the python world!) that can be interpreted as **kwargs once it gets to the Python side.

I have been trying to use this pyparsing example, but the string thats being passed in this example is too specific, and I've never heard of pyparsing until now. I'm trying to make it more, human friendly and robust to small differences in spacing etc. For example, I would like to pass the following.

input_str = "a = [1,2], b= False, c =('abc', 'efg'),d=1"

desired_kwargs = {a : [1,2], b:False, c:('abc','efg'), d:1}

When I try this code though, no love.

from pyparsing import *

# Names for symbols
_quote = Suppress('"')
_eq = Suppress('=')

# Parsing grammar definition
data = (                        
        delimitedList(                   # Zero or more comma-separated items
            Group(                       #   Group the contained unsuppressed tokens in a list
                Regex(u'[^=,)\s] ')      #     Grab everything up to an equal, comma, endparen or whitespace as a token
                Optional(                #     Optionally...
                    _eq                  #       match an = 
                    _quote               #       a quote
                    Regex(u'[^"]*')      #       Grab everything up to another quote as a token
                    _quote)              #       a quote
                )                        #   EndGroup - will have one or two items.
            ))                           # EndList
              

def process(s):
    items = data.parseString(s).asList()
    args = [i[0] for i in items if len(i) == 1]
    kwargs = {i[0]:i[1] for i in items if len(i) == 2}
    return args,kwargs


def hello_world(named_arg, named_arg_2 = 1, **kwargs):
    print(process(kwargs))
    
hello_world(1, 2, "my_kwargs_are_gross = True, some_bool=False, a_list=[1,2,3]")

#output: "{my_kwargs_are_gross : True, some_bool:False, a_list:[1,2,3]}"

Requirements:

  1. The '{' and '}' will be appended on the code side.
  2. Only standard types / standard iterables (list, tuple, etc) will be used in the kwargs-string. No special characters that I can think of...
  3. The kwargs-string will be like they are entered into a function on the python side, ie, 'x=1, y=2'. Not as a string of a dictionary.
  4. I think its a safe assumption that the first step in the string parse will be to remove all whitespace.

CodePudding user response:

One option could be to use the ast module to parse some wrapping of the string that turns it into a valid Python expression. Then you can even use ast.literal_eval if you’re okay with everything it can produce:

>>> import ast
>>> kwargs = "a = [1,2], b= False, c =('abc', 'efg'),d=1"
>>> expr = ast.parse(f"dict({kwargs}\n)", mode="eval")
>>> {kw.arg: ast.literal_eval(kw.value) for kw in expr.body.keywords}
{'a': [1, 2], 'b': False, 'c': ('abc', 'efg'), 'd': 1}

CodePudding user response:

Since the format of your input string is already a valid Python argument list, you don't have to reinvent the wheel with pyparsing but can simply enclose the string in a dict constructor for eval to create the desired kwargs:

desired_kwargs = eval(f'dict({input_str})')

However, evaluating a string from an outside world comes with the security risk of code injection. Since any actual harm can only be done by making a function call, an easy way to avoid the security risk is to parse the code with ast.parse and use ast.walk to invalidate the AST if it contains more than one ast.Call node (there has to be exactly one ast.Call node since we are making a call to the dict constructor):

import ast

code = f'dict({input_str})'
assert sum(isinstance(node, ast.Call) for node in ast.walk(ast.parse(code))) == 1
desired_kwargs = eval(code)

Demo: https://replit.com/@blhsing/OrnateScarceShelfware

CodePudding user response:

You already have some good answers (much easier than this one) if the string you are being passed is well-behaved Python. But if you don't trust the input and/or want to define something a little different, then being explicit about the format you expect may be desirable. In that case, pyparsing is quite useful and readable. The grammar from the question you linked isn't complex enough to handle all your cases, but if you break your grammar out into its constituent elements it is relatively easy to build:

from pyparsing import *

string_arg = QuotedString("'", esc_char="\\", unquote_results=False) | QuotedString("\"", esc_char="\\", unquote_results=False)

number_arg = Word(nums) | Word(nums)   "."   Word(nums)

boolean_arg = Literal("True") | Literal("False")

array_item = string_arg | number_arg
array_list = delimitedList(array_item)
array_arg = Literal("[")   array_list   Literal("]")
tuple_arg = Literal("(")   array_list   Literal(")")

arg_name = Word(identchars, identbodychars)
arg_value = string_arg | number_arg | boolean_arg | tuple_arg | array_arg
arg_item = arg_name   Literal("=").suppress()   arg_value
arg_list = delimitedList(arg_item)

def parseActionValue(string, location, tokens):
    emit_tokens = []
    if tokens[0] == '[':
        emit_tokens = [eval('[' ','.join(tokens[1:-1]) ']')]
    elif tokens[0] == '(':
        emit_tokens = eval('(' ','.join(tokens[1:-1]) ')')
    else:
        emit_tokens = eval(tokens[0])
    return emit_tokens

arg_value.setParseAction(parseActionValue)

def construct_args(s):
    arr = arg_list.parse_string(s, parse_all=True)
    args = {}
    for i in range(0,len(arr),2):
        args[arr[i]] = arr[i 1]
    return args

Where you want to do something a little different or do verification that the tokens look like you expect, you add another setParseAction on the element that you want to work with and emit the Python objects you want in the dict.

CodePudding user response:

python-makefun provides can parse these sorts of strings and may be useful for whatever the use case of the original question is:

import inspect
import makefun


def process_signature(sig: str) -> dict:
    sig = f"f({sig})"
    f = makefun.create_function(sig, (lambda: None))
    result = {}
    for name, arg in inspect.signature(f).parameters.items():
        result[name] = arg.default
    return result


process_signature("a = [1,2], b= False, c =('abc', 'efg'),d=1")

That outputs the desired result: {'a': [1, 2], 'b': False, 'c': ('abc', 'efg'), 'd': 1}

  • Related