Home > Software engineering >  Changing the dtype of category codes in pandas
Changing the dtype of category codes in pandas

Time:06-02

Let's say I have a boolean column stored as a category in a pandas.DataFrame. But there's a twist - the underlying values are str, not bool. I.e., the values are "True"/"False", not True/False.

How do I:

  1. change the dtype of the underlying category values (e.g. from "True" to True) and
  2. continue storing the field as a category?

Having the boolean values as strings is an issue with DataFrame.query, for example. I have to specify DataFrame.query("field == 'True'"), which is pretty horrendous lol.

FYI - I don't want to do DataFrame.astype(dict(field=bool)), because then i lose the memory efficiency from category. i want to keep the category dtype.

CodePudding user response:

Maybe you can try:

df['field'] = df['field'].replace({'True': True, 'False': False})
print(df['field'])

# Output
0    False
1     True
2     True
3    False
Name: field, dtype: category
Categories (2, object): [False, True]  # <- bool

With query:

>>> df.query('field == True')
  field
1  True
2  True

Setup:

df = pd.DataFrame({'field': ['False', 'True', 'True', 'False']}, dtype='category')
print(df['field'])

# Output
0    False
1     True
2     True
3    False
Name: field, dtype: category
Categories (2, object): ['False', 'True']  # <- str

CodePudding user response:

you could try to do that (the values can be used as bools but are mentionned as categories in the data type):

import pandas as pd

# before
data = ['True', 'False', 'True']
df = pd.DataFrame({'data': data}).astype("category")

print('[BEFORE] \n data type = {0} \n values : {1}'.format(df['data'].dtypes, df.values))

# after
df['data'] = list(map(bool, list(df['data'].values)))
df = df.astype("category")

print('[AFTER] \n data type = {0} \n values : {1}'.format(df['data'].dtypes, df.values))

output:

[BEFORE] 
 data type = category
 values : [['True']
 ['False']
 ['True']]
[AFTER]
 data type = category
 values : [[True]
 [True]
 [True]]
  • Related