Home > Enterprise >  pd.read_csv gives entire data in object dtype. How do I convert to int type?
pd.read_csv gives entire data in object dtype. How do I convert to int type?

Time:11-26

I am trying to read the a particular csv (plane-data.csv) but the entire df is in object type. I require 'year' to be in integer type so that I can perform calculations.

Please take look at my screenshot

My dataset is from plane-data.csv link

Would really love to have some help, I have been searching the entire internet for 6 hours with no progress. Thank you !

Initially, I tried

import pandas as pd
df = pd.read_csv('plane-data.csv')
columns =  ['type', 'manufacturer', 'issue_date', 'model', 'status', 'aircraft_type', 'engine_type']
df.drop(columns, axis=1, inplace=True)
df.dropna(inplace=True)

df['year'] = df['year'].astype(int)

and got

ValueError: invalid literal for int() with base 10: 'None'

Which I have found to be the result of NaN values.

I have cleared all nullvalues and tried using

df['year'] = df['year'].astype(str).astype('Int64')

from other SO posts that seems to work for them not for me. I got

TypeError: object cannot be converted to an IntegerDtype

CodePudding user response:

You get the following error:

TypeError: 'method' object is not subscriptable

because you used [] instead of () in df['year'] = df['year'].astype[int]. You should use df['year'] = df['year'].astype(int)

CodePudding user response:

Since the column year contains a string value (literally None), pandas is consedering the whole column as object. You can handle that by simply setting na_values=['None'] as an argument of pandas.read_csv :

df = pd.read_csv('plane-data.csv', na_values=['None'])

Or, you can use pandas.to_numeric :

df = pd.read_csv('plane-data.csv')

df['year']= pd.to_numeric(df['year'], errors='coerce') # invalid parsing will be set as NaN

# Output :

print(df.dtypes)

tailnum           object
type              object
manufacturer      object
issue_date        object
model             object
status            object
aircraft_type     object
engine_type       object
year             float64
dtype: object
  • Related