I am new to pyhon and i can't find a way to manipulate a string of mine into a List.
I read a file that contain this exact output (it is one parameter of a csv file):
[{a,b,c},{aa,bb,cc}]
I want to format that string to get an array of 2 objects :
[{
val_a: 'a',
val_b: 'b',
val_c: 'c'
},
{
val_a: 'aa',
val_b: 'bb',
val_c: 'cc'
}]
Is there a simple way of doing this ?
CodePudding user response:
So, first off, what you're calling an "object" in your description is really a JavaScript-like definition of an object, one that is not commonly used in other languages. This older style of JavaScript object, without an actual class declaration that constructs objects with pre-defined sets of attributes and methods, is a little too explicitly just a string-keyed hash table that you can bolt fully object-oriented features onto after the fact. It's close to Python's dict
syntax (where dict
is a fully generalized hash table, but with no ability to bolt on additional class behaviors after the fact), but it's still frowned upon if you intend to make actual objects with a consistent set of attributes; using plain dict
s doesn't clearly indicate which keys are expected to exist (maybe the next dict
has an extra key, and the one after is missing a key), and Python makes no real effort to optimize for large numbers of plain dict
s that just happen to all be defined with the same set of keys (most Python objects are implemented with an underlying dict
to store their attributes, and they have optimized dict
to reduce memory usage in that use case, but it relies on the existence of a class for the dict
to store the shared keys on for retrieval by future instance dict
s).
The simplest way to do this Pythonically would be to use a namedtuple
(a lightweight class defining lightweight immutable objects with a fixed set of keys; it's actually lower memory usage than the equivalent manually defined class unless that manually defined class explicitly uses __slots__
to opt-out of a per-instance attribute dict
):
import json
import re
from collections import namedtuple
MyClass = namedtuple('MyClass', 'val_a val_b val_c') # Defined once at top-level of the file for
# reuse; makes a lightweight tuple subclass
mystr = '[{a,b,c},{aa,bb,cc}]'
# Make string legal JSON (needs tweaking based on real data)
mystr = mystr.replace('{', '[').replace('}', ']') # Sets are unordered, we don't want to lose ordering by converting to set
quoted_str = re.sub(r'(\w )', r'"\1"', mystr)
# Decode from JSON to Python types
orig_data = json.loads(quoted_str)
# Convert from list of three-lists to list of MyClass instances
obj_data = [MyClass(*datagrp) for datagrp in orig_data]
print(obj_data)
# If you'd like it to look like a dict, you can expand from memory-efficient namedtuples to less efficient dicts as needed
print([obj._asdict() for obj in obj_data])
which, on Python 3.8 and higher, produces the output:
[MyClass(val_a='a', val_b='b', val_c='c'), MyClass(val_a='aa', val_b='bb', val_c='cc')]
[{'val_a': 'a', 'val_b': 'b', 'val_c': 'c'}, {'val_a': 'aa', 'val_b': 'bb', 'val_c': 'cc'}]
Pre-3.8, you'd get the result of the final output as OrderedDict
s, not dict
(until 3.7, dict
wasn't guaranteed to preserve key order, so they used OrderedDict
initially to do so), but if you don't need reliable ordering of keys in the output, changing the final line to:
print([dict(obj._asdict()) for obj in obj_data])
will do so.
If you don't want namedtuple
s for whatever reason (mostly commonly because you don't want the objects to behave as tuple
s, having length and iterability, or you want them to be mutable), you can define a similar lightweight class with the dataclasses
module, it's just a little more verbose:
from dataclasses import dataclass, asdict
@dataclass # On 3.10 , use @dataclass(slots=True) if you want reduced memory per instance and no auto-vivification if you assign to non-existent attribute
class MyClass:
val_a: str
val_b: str
val_c: str
and that defines all the common stuff you need use the rest of the code from before, with the only change being to change obj._asdict()
to asdict(obj)
(dataclasses
made it a top-level function of the dataclass
module instead of a method on the type to avoid polluting the class's namespace). You never need the dict()
wrapping namedtuple
requires pre-3.8, because:
dataclasses
was only introduced in 3.7 (which has insertion-ordered plaindict
), and- Because of that, it defaulted to
dict
from the beginning