I have a csv as shown below:
db,date,RequestCount,ScheduledCount,PerformedCount,Product
abc,2020-06-01 00:00:00.000,51,22,37,xyz
abc,2020-06-02 00:00:00.000,,11,19,xyz
abc,2020-06-03 00:00:00.000,52,20,36,xyz
abc,2020-06-04 00:00:00.000,37,12,17,xyz
abc,2020-06-05 00:00:00.000,57,23,39,xyz
abc,2020-06-08 00:00:00.000,37,13,28,xyz
abc,2020-06-09 00:00:00.000,48,16,31,xyz
abc,2020-06-10 00:00:00.000,60,24,40,xyz
abc,2020-06-11 00:00:00.000,35,11,,xyz
I am reading it with pandas(pd.read_csv
). Since there are two NaN
here, the columns RequestCount
and PerformedCount
are converted into float64
.
But I need all the numeric columns to be of type integer
.
How can I achieve this?
I tried this,
astype(int)
- Failing at Nan Values.
CodePudding user response:
If you are using pandas ≥1.0, you can benefit from the new nullable integer type:
df['RequestCount'] = df['RequestCount'].astype('Int64')
NB. note the capital I
in Int64
output:
db date RequestCount ScheduledCount PerformedCount Product
0 abc 2020-06-01 00:00:00.000 51 22 37.0 xyz
1 abc 2020-06-02 00:00:00.000 <NA> 11 19.0 xyz
2 abc 2020-06-03 00:00:00.000 52 20 36.0 xyz
3 abc 2020-06-04 00:00:00.000 37 12 17.0 xyz
4 abc 2020-06-05 00:00:00.000 57 23 39.0 xyz
5 abc 2020-06-08 00:00:00.000 37 13 28.0 xyz
6 abc 2020-06-09 00:00:00.000 48 16 31.0 xyz
7 abc 2020-06-10 00:00:00.000 60 24 40.0 xyz
8 abc 2020-06-11 00:00:00.000 35 11 NaN xyz
>>> df['RequestCount'].isna()
0 False
1 True
2 False
...
CodePudding user response:
Use this:
df=pd.read_csv('my_file.csv')
df.filna(0, inplace=True)
df.astype({'RequestCount':'Int64', 'ScheduledCount':'Int64', 'PerformedCount':'Int64'})