Home > front end >  convert float to int handling Nan values
convert float to int handling Nan values

Time:09-28

I have a csv as shown below:

db,date,RequestCount,ScheduledCount,PerformedCount,Product
abc,2020-06-01 00:00:00.000,51,22,37,xyz
abc,2020-06-02 00:00:00.000,,11,19,xyz
abc,2020-06-03 00:00:00.000,52,20,36,xyz
abc,2020-06-04 00:00:00.000,37,12,17,xyz
abc,2020-06-05 00:00:00.000,57,23,39,xyz
abc,2020-06-08 00:00:00.000,37,13,28,xyz
abc,2020-06-09 00:00:00.000,48,16,31,xyz
abc,2020-06-10 00:00:00.000,60,24,40,xyz
abc,2020-06-11 00:00:00.000,35,11,,xyz

I am reading it with pandas(pd.read_csv). Since there are two NaN here, the columns RequestCount and PerformedCount are converted into float64.

But I need all the numeric columns to be of type integer.

How can I achieve this?

I tried this,

  1. astype(int) - Failing at Nan Values.

CodePudding user response:

If you are using pandas ≥1.0, you can benefit from the new nullable integer type:

df['RequestCount'] = df['RequestCount'].astype('Int64')

NB. note the capital I in Int64

output:

    db                     date  RequestCount  ScheduledCount  PerformedCount Product
0  abc  2020-06-01 00:00:00.000            51              22            37.0     xyz
1  abc  2020-06-02 00:00:00.000          <NA>              11            19.0     xyz
2  abc  2020-06-03 00:00:00.000            52              20            36.0     xyz
3  abc  2020-06-04 00:00:00.000            37              12            17.0     xyz
4  abc  2020-06-05 00:00:00.000            57              23            39.0     xyz
5  abc  2020-06-08 00:00:00.000            37              13            28.0     xyz
6  abc  2020-06-09 00:00:00.000            48              16            31.0     xyz
7  abc  2020-06-10 00:00:00.000            60              24            40.0     xyz
8  abc  2020-06-11 00:00:00.000            35              11             NaN     xyz
>>> df['RequestCount'].isna()
0    False
1     True
2    False
...

CodePudding user response:

Use this:

df=pd.read_csv('my_file.csv')
df.filna(0, inplace=True)
df.astype({'RequestCount':'Int64', 'ScheduledCount':'Int64', 'PerformedCount':'Int64'})
  • Related