Home > OS >  How do I convert integer 'category' dtypes in a Pandas DataFrame to 'int64'/
How do I convert integer 'category' dtypes in a Pandas DataFrame to 'int64'/

Time:02-11

Take a look at the Pandas DataFrame here.

I have certain columns that are strings, and others that are integers/floats. However, all the columns in the dataset are currently formatted with a 'category' dtype.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29744 entries, 0 to 29743
Data columns (total 366 columns):
 #    Column      Non-Null Count  Dtype   
---   ------      --------------  -----   
 0    ASBG01      29360 non-null  category
 1    ASBG03      28726 non-null  category
 2    ASBG04      28577 non-null  category
 3    ASBG05A     29130 non-null  category
 4    ASBG05B     29055 non-null  category
 5    ASBG05C     29001 non-null  category
 6    ASBG05D     28938 non-null  category
 7    ASBG05E     28938 non-null  category
 8    ASBG05F     29030 non-null  category
 9    ASBG05G     28745 non-null  category
 10   ASBG05H     28978 non-null  category
 11   ASBG05I     28971 non-null  category
 12   ASBG06A     28956 non-null  category
 13   ASBG06B     28797 non-null  category
 14   ASBG07      28834 non-null  category
 15   ASBG08      28955 non-null  category
 16   ASBG09A     28503 non-null  category
 17   ASBG09B     27778 non-null  category
 18   ASBG10A     29025 non-null  category
 19   ASBG10B     28940 non-null  category
 ...
 363  ATDMDAT     13133 non-null  category
 364  ATDMMEM     25385 non-null  category
 365  Target      29744 non-null  float64 
dtypes: category(365), float64(1)
memory usage: 60.5 MB

How can I convert all the columns that have a integer/float value under them to actual integer/float dtypes?

Thanks.

CodePudding user response:

Suppose the following dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({'cat_str': ['Hello', 'World'],
                   'cat_int': [0, 1],
                   'cat_float': [3.14, 2.71]}, dtype='category')
print(df.dtypes)

# Output
cat_str      category
cat_int      category
cat_float    category
dtype: object

You can try:

dtypes = {col: df[col].cat.categories.dtype for col in df.columns
             if np.issubdtype(df[col].cat.categories.dtype, np.number)}

df = df.astype(dtypes)
print(df.dtypes)

# Output
cat_str      category
cat_int         int64
cat_float     float64
dtype: object

Or if you want to remove all category dtypes, use:

dtypes = {col: df[col].cat.categories.dtype for col in df.columns}

df = df.astype(dtypes)
print(df.dtypes)

# Output
cat_str       object
cat_int        int64
cat_float    float64
dtype: object
  • Related