Home > Mobile >  pandas read_json dtype=pd.CategoricalDtype does not work but dtype='category' does
pandas read_json dtype=pd.CategoricalDtype does not work but dtype='category' does

Time:01-12

Is this a known issue that specifying CategoricalDtype dtype at read_json does not convert the column dtype, or is there a mistake in the code?

import pandas as pd

df = pd.read_json(
    "./data/data.json",
    dtype={
        #"facility": pd.CategoricalDtype, # does not work
        "facility": 'category',           # does work
        "supplier": pd.CategoricalDtype,  # does not work
    }
)
df.info()
-----
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   facility      232 non-null    category      
 3   supplier      111 non-null    object     

Environment

MacOS 13.0.1 (22A400)
$ python --version
Python 3.9.13
$ pip list | grep pandas
pandas                      1.5.2

CodePudding user response:

According to the documentation:

Since dtype='category' is essentially CategoricalDtype(None, False), and since all instances CategoricalDtype compare equal to 'category', all instances of CategoricalDtype compare equal to a CategoricalDtype(None, False), regardless of categories or ordered.

Try to:

"supplier": pd.CategoricalDtype()
  • Related