Is this a known issue that specifying CategoricalDtype dtype at read_json does not convert the column dtype, or is there a mistake in the code?
import pandas as pd
df = pd.read_json(
"./data/data.json",
dtype={
#"facility": pd.CategoricalDtype, # does not work
"facility": 'category', # does work
"supplier": pd.CategoricalDtype, # does not work
}
)
df.info()
-----
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 facility 232 non-null category
3 supplier 111 non-null object
Environment
MacOS 13.0.1 (22A400)
$ python --version
Python 3.9.13
$ pip list | grep pandas
pandas 1.5.2
CodePudding user response:
According to the documentation:
Since dtype='category' is essentially CategoricalDtype(None, False), and since all instances CategoricalDtype compare equal to 'category', all instances of CategoricalDtype compare equal to a CategoricalDtype(None, False), regardless of categories or ordered.
Try to:
"supplier": pd.CategoricalDtype()