Home > Enterprise >  What is the difference between int64 and category column type in pandas
What is the difference between int64 and category column type in pandas

Time:02-23

I have this code:

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'],'Sex': ['Male', 'Female', 'Male', 'Female'], 'Age': [20, 21, 19, 18],} 
df=pd.DataFrame(data)   
df['Sex']=df['Sex'].astype('category') 
df.info()
int_cols=[1 if x == 'int64' else 0 for x in df.dtypes ]
print(int_cols)
cat_cols=[1 if x == 'category' else 0 for x in df.dtypes ]
print(cat_cols)

When I run it, I am getting error on line cat_cols=[1 if x == 'category' else 0 for x in df.dtypes ] but not on int_cols=[1 if x == 'int64' else 0 for x in df.dtypes ]

what is the difference between them? How can I change the above code so it works and I get a list of 1,0 for columns that are category?

CodePudding user response:

Try:

>>> [1 if x.name=="category" else 0 for x in df.dtypes]
[0, 1, 0]
  • Related