Home > Mobile >  Why NaN values are not getting counted as null datatypes while using info() method?
Why NaN values are not getting counted as null datatypes while using info() method?

Time:05-12

I am working with the credit card approval dataset from the UCI ML repository. The dataset contains missing values marked as '?'

display(cc_apps.tail(17))

    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
673 ?   29.50   2.000   y   p   e   h   2.000   f   f   0   f   g   00256   17  -
674 a   37.33   2.500   u   g   i   h   0.210   f   f   0   f   g   00260   246 -
675 a   41.58   1.040   u   g   aa  v   0.665   f   f   0   f   g   00240   237 -
676 a   30.58   10.665  u   g   q   h   0.085   f   t   12  t   g   00129   3   -
677 b   19.42   7.250   u   g   m   v   0.040   f   t   1   f   g   00100   1   -
678 a   17.92   10.210  u   g   ff  ff  0.000   f   f   0   f   g   00000   50  -
679 a   20.08   1.250   u   g   c   v   0.000   f   f   0   f   g   00000   0   -
680 b   19.50   0.290   u   g   k   v   0.290   f   f   0   f   g   00280   364 -
681 b   27.83   1.000   y   p   d   h   3.000   f   f   0   f   g   00176   537 -
682 b   17.08   3.290   u   g   i   v   0.335   f   f   0   t   g   00140   2   -
683 b   36.42   0.750   y   p   d   v   0.585   f   f   0   f   g   00240   3   -
684 b   40.58   3.290   u   g   m   v   3.500   f   f   0   t   s   00400   0   -
685 b   21.08   10.085  y   p   e   h   1.250   f   f   0   f   g   00260   0   -
686 a   22.67   0.750   u   g   c   v   2.000   f   t   2   t   g   00200   394 -
687 a   25.25   13.500  y   p   ff  ff  2.000   f   t   1   t   g   00200   1   -
688 b   17.92   0.205   u   g   aa  v   0.040   f   f   0   f   g   00280   750 -
689 b   35.00   3.375   u   g   c   h   8.290   f   f   0   t   g   00000   0   -

I converted these '?' to NaN using the replace() method.

cc_apps_train = cc_apps_train.replace('?', 'NaN')

but when I am printing the data frame information using the info() method, it's not showing null value information.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 462 entries, 382 to 102
Data columns (total 14 columns):
0     462 non-null object
1     462 non-null object
2     462 non-null float64
3     462 non-null object
4     462 non-null object
5     462 non-null object
6     462 non-null object
7     462 non-null float64
8     462 non-null object
9     462 non-null object
10    462 non-null int64
12    462 non-null object
14    462 non-null int64
15    462 non-null object
dtypes: float64(2), int64(2), object(10)
memory usage: 54.1  KB

Can anyone please explain this?

CodePudding user response:

Your line of code:

cc_apps_train = cc_apps_train.replace('?', 'NaN')

converts string '?' to string 'NaN'. It counts as a non-null object, because it is a string. Change 'NaN' to numpy.NaN and it should work fine.

  • Related