I am trying to read the a particular csv (plane-data.csv) but the entire df is in object type. I require 'year' to be in integer type so that I can perform calculations.
Please take look at my screenshot
My dataset is from plane-data.csv link
Would really love to have some help, I have been searching the entire internet for 6 hours with no progress. Thank you !
Initially, I tried
import pandas as pd
df = pd.read_csv('plane-data.csv')
columns = ['type', 'manufacturer', 'issue_date', 'model', 'status', 'aircraft_type', 'engine_type']
df.drop(columns, axis=1, inplace=True)
df.dropna(inplace=True)
df['year'] = df['year'].astype(int)
and got
ValueError: invalid literal for int() with base 10: 'None'
Which I have found to be the result of NaN values.
I have cleared all nullvalues and tried using
df['year'] = df['year'].astype(str).astype('Int64')
from other SO posts that seems to work for them not for me. I got
TypeError: object cannot be converted to an IntegerDtype
CodePudding user response:
You get the following error:
TypeError: 'method' object is not subscriptable
because you used [] instead of () in df['year'] = df['year'].astype[int]
. You should use df['year'] = df['year'].astype(int)
CodePudding user response:
Since the column year
contains a string value (literally None
), pandas is consedering the whole column as object
. You can handle that by simply setting na_values=['None']
as an argument of pandas.read_csv
:
df = pd.read_csv('plane-data.csv', na_values=['None'])
Or, you can use pandas.to_numeric
:
df = pd.read_csv('plane-data.csv')
df['year']= pd.to_numeric(df['year'], errors='coerce') # invalid parsing will be set as NaN
# Output :
print(df.dtypes)
tailnum object
type object
manufacturer object
issue_date object
model object
status object
aircraft_type object
engine_type object
year float64
dtype: object