I am reading text from files and text looks like this:
"(id=336346860, name='Western Australia', slug='western-australia', has_public_page=True, lat=-26.0, lng=121.0)"
I would like to convert this to a dict. I tried to convert it into dict type but it's giving error:
fileOutput = "(id=336346860, name='Western Australia', slug='western-australia', has_public_page=True, lat=-26.0, lng=121.0)"
x = dict(fileOutput)
Error:
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Can someone help to find out a solution for this?
CodePudding user response:
A more robust approach would be to prefix the string with an identifier, such as _
, to make it a valid Python syntax for a function call, then use ast.parse
to parse the string as Python code, traverse the code tree with ast.walk
, and look for the ast.Call
node, where there is the keywords
attribute with a list of keyword arguments, from which you can extract the name from the arg
attribute and the value from the value
attribute. Since the value
attribute itself can be an expression such as -26.0
in your sample input, consisting of a constant of 26.0
and unary operation of -
, you can use ast.literal_eval
to evaluate the node to convert it to the value it represents:
{
keyword.arg: ast.literal_eval(keyword.value)
for node in ast.walk(ast.parse('_' fileOutput)) if isinstance(node, ast.Call)
for keyword in node.keywords
}
With your sample input, this returns:
{'id': 336346860, 'name': 'Western Australia', 'slug': 'western-australia', 'has_public_page': True, 'lat': -26.0, 'lng': 121.0}
CodePudding user response:
You can do something with ast.parse
. Parse the string into the constructor of any function (doesn't have to be dict
), then extract the keyword arguments. For example, start with
>>> mod = ast.parse('dict' fileOutput)
>>> print(ast.dump(mod, indent=4))
Module(
body=[
Expr(
value=Call(
func=Name(id='dict', ctx=Load()),
args=[],
keywords=[
keyword(
arg='id',
value=Constant(value=336346860)),
keyword(
arg='name',
value=Constant(value='Western Australia')),
keyword(
arg='slug',
value=Constant(value='western-australia')),
keyword(
arg='has_public_page',
value=Constant(value=True)),
keyword(
arg='lat',
value=UnaryOp(
op=USub(),
operand=Constant(value=26.0))),
keyword(
arg='lng',
value=Constant(value=121.0))]))],
type_ignores=[])
You can now extract the keywords pretty easily. You can expect arbitary trees even in the arguments, so you will have to apply ast.literal_eval
to each keyword independently. This is not particularly difficult.
First sanitize the input a little to make sure it at least appears to be a call to the dict
constructor (or whatever function name you prepended):
if len(mod.body) > 1 or not isinstance(call := mod.body[0].value, ast.Call) or call.func.id != 'dict':
raise ValueError('Not just one dict')
if call.args:
raise ValueError('Why are there positional args?')
Now you can extract the keywords:
>>> {x.arg: ast.literal_eval(x.value) for x in call.keywords}
{'id': 336346860,
'name': 'Western Australia',
'slug': 'western-australia',
'has_public_page': True,
'lat': -26.0,
'lng': 121.0}
ast.literal_eval
will crash if anyone tries to sneak in arbitrary function calls.
TL;DR
def parse_line(line):
mod = ast.parse('dict' fileOutput)
if len(mod.body) > 1 or not isinstance(call := mod.body[0].value, ast.Call) or call.func.id != 'dict':
raise ValueError('Not just one dict')
if call.args:
raise ValueError('Why are there positional args?')
return {x.arg: ast.literal_eval(x.value) for x in call.keywords}
CodePudding user response:
I built a custom class to fit into your requirements based on few assumptions listed below:
- Input always starts and ends with parenthesis
()
. - Input may only include
""
(empty string) or"()"
(empty parenthesis) or the actual values like"(id=336346860, name='Western Australia', slug='western-australia', has_public_page=True, lat=-26.0, lng=121.0)"
. - Values will be only among python supported
str
,bool
,int
,float
. - Key-value pairs are always seperated by
=
. ,
(comma) is not a part of value. (ie. comma is not present anywhere in values
If any one of the above assumptions is broken, the class may not work as expected
Code is given below:
from typing import Optional
class MyDict:
def setRawElements(self):
"""Create a list by splitting the given string"""
# Assumption #5
# If there is any comma in the value, then the split may be inconsistent
self.raw_elements = self.string.split(", ")
def splitKeyValuePairs(self):
"""Split into key value pairs and create a internal dictionary"""
for elem in self.raw_elements:
# Assumption #4
# If the key and the value is not seperated by '=', then the split may be inconsistent
key, value = elem.split("=")
self.dictionary[key] = value
def setKeyTypes(self):
"""Type conversion"""
for key, value in self.dictionary.items():
# Assumption #3
# Value must be one among (bool, str, float, int)
if value in ["True", "False"]:
# check if the value is a boolean [True, False]
type_ = bool
elif value and value[0] == value[-1] == "'":
# check if the value is a str object
self.dictionary[key] = self.dictionary[key][1:-1]
# we need not convert a str to str, so we can skip the conversion part
continue
elif "." in value:
# float values will have two parts, integer and fraction seperated by a period
type_ = float
else:
# if above mentioned cases are not matched, ww assume that the type is int
type_ = int
# type conversion from str to excpected type
self.dictionary[key] = type_(self.dictionary[key])
def parse(self, string):
self.dictionary = {}
self.string = string
if string and string[1:-1]:
# Assumption #1 and #2
# If string is not empty and not just empty parenthesis
self.string = self.string[1:-1] # remove parenthesis from start and end
self.setRawElements()
self.splitKeyValuePairs()
self.setKeyTypes()
return self.dictionary
def __new__(cls, string: str) -> Optional[dict]:
"""Calling a class will return parsed dictionary"""
return super().__new__(cls).parse(string)
To use the class, refer the below code:
fileOutput = "(id=336346860, name='Western Australia', slug='western-australia', has_public_page=True, lat=-26.0, lng=121.0)"
x = MyDict(fileOutput)
print(x)
Below is the output:
{'id': 336346860, 'name': 'Western Australia', 'slug': 'western-australia', 'has_public_page': True, 'lat': -26.0, 'lng': 121.0}
To check the types of the values, refer the below code:
for key, value in x.items():
print(key, value, type(value), sep=" - ")
Output:
id - 336346860 - <class 'int'>
name - Western Australia - <class 'str'>
slug - western-australia - <class 'str'>
has_public_page - True - <class 'bool'>
lat - -26.0 - <class 'float'>
lng - 121.0 - <class 'float'>