Take a look at the Pandas DataFrame here.
I have certain columns that are strings, and others that are integers/floats. However, all the columns in the dataset are currently formatted with a 'category' dtype.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29744 entries, 0 to 29743
Data columns (total 366 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ASBG01 29360 non-null category
1 ASBG03 28726 non-null category
2 ASBG04 28577 non-null category
3 ASBG05A 29130 non-null category
4 ASBG05B 29055 non-null category
5 ASBG05C 29001 non-null category
6 ASBG05D 28938 non-null category
7 ASBG05E 28938 non-null category
8 ASBG05F 29030 non-null category
9 ASBG05G 28745 non-null category
10 ASBG05H 28978 non-null category
11 ASBG05I 28971 non-null category
12 ASBG06A 28956 non-null category
13 ASBG06B 28797 non-null category
14 ASBG07 28834 non-null category
15 ASBG08 28955 non-null category
16 ASBG09A 28503 non-null category
17 ASBG09B 27778 non-null category
18 ASBG10A 29025 non-null category
19 ASBG10B 28940 non-null category
...
363 ATDMDAT 13133 non-null category
364 ATDMMEM 25385 non-null category
365 Target 29744 non-null float64
dtypes: category(365), float64(1)
memory usage: 60.5 MB
How can I convert all the columns that have a integer/float value under them to actual integer/float dtypes?
Thanks.
CodePudding user response:
Suppose the following dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'cat_str': ['Hello', 'World'],
'cat_int': [0, 1],
'cat_float': [3.14, 2.71]}, dtype='category')
print(df.dtypes)
# Output
cat_str category
cat_int category
cat_float category
dtype: object
You can try:
dtypes = {col: df[col].cat.categories.dtype for col in df.columns
if np.issubdtype(df[col].cat.categories.dtype, np.number)}
df = df.astype(dtypes)
print(df.dtypes)
# Output
cat_str category
cat_int int64
cat_float float64
dtype: object
Or if you want to remove all category dtypes, use:
dtypes = {col: df[col].cat.categories.dtype for col in df.columns}
df = df.astype(dtypes)
print(df.dtypes)
# Output
cat_str object
cat_int int64
cat_float float64
dtype: object