My goal is to create an output that has a Series datatype and following output:
I tried to achieve this by using the code below:
series_structure = pd.Series()
for i in table_dtypes[0]:
if i == "object":
type_dict = {'type': 'categorical'}
series_structure.append(type_dict)
elif i == "boolean":
type_dict = {'type': 'boolean'}
series_structure.append(type_dict)
elif i == "datetime64": # revisit here
type_dict = {'type': 'datetime', 'format': '%Y-%m-%d'}
series_structure.append(type_dict)
elif i == "int64":
type_dict = {'type': 'id', 'subtype': 'integer'}
series_structure.append(type_dict)
elif i == "float64": # revisit here
type_dict = {'type': 'numerical', 'subtype': 'float'}
series_structure.append(type_dict)
But I get the error below:
TypeError: cannot concatenate object of type '<class 'dict'>'; only Series and DataFrame objs are valid
For reference my input dataset looks like this (table_dtypes):
What can I do?
CodePudding user response:
You seem to confuse list.append
with Series.append
. As per the documentation: the latter expects a "Series or list/tuple of Series", hence the error. Apart from that, (1) the method is deprecated, and (2) "growing" a df
or Series
row-wise is generally an ill-advised practice (see this post
).
One remedy could be to append to a list
, and then use it as input for pd.Series
, as suggested in the answer
by Z Li
.
Perhaps a better approach would be to get rid of the entire loop, and simply create a dict
with the values from your if/elif-statements
as keys, and the appropriate dicts
as values. You can then simply use Series.map
to achieve the desired result. E.g.:
my_dict = {'object': {'type': 'categorical'},
'booolean': {'type': 'boolean'},
'datetime64': {'type': 'datetime', 'format': '%Y-%m-%d'},
'int64': {'type': 'id', 'subtype': 'integer'},
'float64': {'type': 'numerical', 'subtype': 'float'}}
series_structure = table_dtypes[0].map(my_dict)
print(series_structure)
0 {'type': 'id', 'subtype': 'integer'}
1 {'type': 'id', 'subtype': 'integer'}
2 {'type': 'categorical'}
3 {'type': 'categorical'}
Name: 0, dtype: object
CodePudding user response:
You can create a pd.Series
at the end of the loop, which should also be faster:
series_structure = []
for i in table_dtypes[0]:
if i == "object":
type_dict = {'type': 'categorical'}
series_structure.append(type_dict)
# ...
series_structure = pd.Series(series_structure)