Create a List from a string in Python-CodePudding

I am new to pyhon and i can't find a way to manipulate a string of mine into a List.

I read a file that contain this exact output (it is one parameter of a csv file):

[{a,b,c},{aa,bb,cc}]

I want to format that string to get an array of 2 objects :

[{
    val_a: 'a',
    val_b: 'b',
    val_c: 'c'
},
{
    val_a: 'aa',
    val_b: 'bb',
    val_c: 'cc'
}]

Is there a simple way of doing this ?

CodePudding user response：

So, first off, what you're calling an "object" in your description is really a JavaScript-like definition of an object, one that is not commonly used in other languages. This older style of JavaScript object, without an actual class declaration that constructs objects with pre-defined sets of attributes and methods, is a little too explicitly just a string-keyed hash table that you can bolt fully object-oriented features onto after the fact. It's close to Python's dict syntax (where dict is a fully generalized hash table, but with no ability to bolt on additional class behaviors after the fact), but it's still frowned upon if you intend to make actual objects with a consistent set of attributes; using plain dicts doesn't clearly indicate which keys are expected to exist (maybe the next dict has an extra key, and the one after is missing a key), and Python makes no real effort to optimize for large numbers of plain dicts that just happen to all be defined with the same set of keys (most Python objects are implemented with an underlying dict to store their attributes, and they have optimized dict to reduce memory usage in that use case, but it relies on the existence of a class for the dict to store the shared keys on for retrieval by future instance dicts).

The simplest way to do this Pythonically would be to use a namedtuple (a lightweight class defining lightweight immutable objects with a fixed set of keys; it's actually lower memory usage than the equivalent manually defined class unless that manually defined class explicitly uses __slots__ to opt-out of a per-instance attribute dict):

import json
import re
from collections import namedtuple

MyClass = namedtuple('MyClass', 'val_a val_b val_c')  # Defined once at top-level of the file for
                                                      # reuse; makes a lightweight tuple subclass
mystr = '[{a,b,c},{aa,bb,cc}]'

# Make string legal JSON (needs tweaking based on real data)
mystr = mystr.replace('{', '[').replace('}', ']')  # Sets are unordered, we don't want to lose ordering by converting to set
quoted_str = re.sub(r'(\w )', r'"\1"', mystr)

# Decode from JSON to Python types
orig_data = json.loads(quoted_str)

# Convert from list of three-lists to list of MyClass instances
obj_data = [MyClass(*datagrp) for datagrp in orig_data]

print(obj_data)

# If you'd like it to look like a dict, you can expand from memory-efficient namedtuples to less efficient dicts as needed
print([obj._asdict() for obj in obj_data])

Try it online!

which, on Python 3.8 and higher, produces the output:

[MyClass(val_a='a', val_b='b', val_c='c'), MyClass(val_a='aa', val_b='bb', val_c='cc')]
[{'val_a': 'a', 'val_b': 'b', 'val_c': 'c'}, {'val_a': 'aa', 'val_b': 'bb', 'val_c': 'cc'}]

Pre-3.8, you'd get the result of the final output as OrderedDicts, not dict (until 3.7, dict wasn't guaranteed to preserve key order, so they used OrderedDict initially to do so), but if you don't need reliable ordering of keys in the output, changing the final line to:

print([dict(obj._asdict()) for obj in obj_data])

will do so.

If you don't want namedtuples for whatever reason (mostly commonly because you don't want the objects to behave as tuples, having length and iterability, or you want them to be mutable), you can define a similar lightweight class with the dataclasses module, it's just a little more verbose:

from dataclasses import dataclass, asdict

@dataclass  # On 3.10 , use @dataclass(slots=True) if you want reduced memory per instance and no auto-vivification if you assign to non-existent attribute
class MyClass:
    val_a: str
    val_b: str
    val_c: str

and that defines all the common stuff you need use the rest of the code from before, with the only change being to change obj._asdict() to asdict(obj) (dataclasses made it a top-level function of the dataclass module instead of a method on the type to avoid polluting the class's namespace). You never need the dict() wrapping namedtuple requires pre-3.8, because:

dataclasses was only introduced in 3.7 (which has insertion-ordered plain dict), and
Because of that, it defaulted to dict from the beginning