Home > Enterprise >  Python: How to append individual dictionaries to Series data structure
Python: How to append individual dictionaries to Series data structure

Time:11-01

My goal is to create an output that has a Series datatype and following output:

enter image description here

I tried to achieve this by using the code below:

series_structure = pd.Series()

for i in table_dtypes[0]:
    if i == "object":
        type_dict = {'type': 'categorical'}
        series_structure.append(type_dict)
    elif i == "boolean":
        type_dict = {'type': 'boolean'}
        series_structure.append(type_dict)
    elif i == "datetime64": # revisit here 
        type_dict = {'type': 'datetime', 'format': '%Y-%m-%d'}
        series_structure.append(type_dict)
    elif i == "int64":
        type_dict = {'type': 'id', 'subtype': 'integer'}
        series_structure.append(type_dict)
    elif i == "float64": # revisit here 
        type_dict = {'type': 'numerical', 'subtype': 'float'}
        series_structure.append(type_dict)

But I get the error below:

TypeError: cannot concatenate object of type '<class 'dict'>'; only Series and DataFrame objs are valid

For reference my input dataset looks like this (table_dtypes): enter image description here

What can I do?

CodePudding user response:

You seem to confuse list.append with Series.append. As per the documentation: the latter expects a "Series or list/tuple of Series", hence the error. Apart from that, (1) the method is deprecated, and (2) "growing" a df or Series row-wise is generally an ill-advised practice (see this post).

One remedy could be to append to a list, and then use it as input for pd.Series, as suggested in the answer by Z Li.

Perhaps a better approach would be to get rid of the entire loop, and simply create a dict with the values from your if/elif-statements as keys, and the appropriate dicts as values. You can then simply use Series.map to achieve the desired result. E.g.:

my_dict = {'object': {'type': 'categorical'},
           'booolean': {'type': 'boolean'},
           'datetime64': {'type': 'datetime', 'format': '%Y-%m-%d'},
           'int64': {'type': 'id', 'subtype': 'integer'},
           'float64': {'type': 'numerical', 'subtype': 'float'}}

series_structure = table_dtypes[0].map(my_dict)

print(series_structure)

0    {'type': 'id', 'subtype': 'integer'}
1    {'type': 'id', 'subtype': 'integer'}
2                 {'type': 'categorical'}
3                 {'type': 'categorical'}
Name: 0, dtype: object

CodePudding user response:

You can create a pd.Series at the end of the loop, which should also be faster:

series_structure = []

for i in table_dtypes[0]:
    if i == "object":
        type_dict = {'type': 'categorical'}
        series_structure.append(type_dict)
    # ...

series_structure = pd.Series(series_structure)

  • Related