Here is a sample CSV file:
X,Y
A,
B,
C,D
After reading this file, pandas treats empty cell as NaN
values:
df = pd.read_csv("test.csv")
X Y
0 A NaN
1 B NaN
2 C D
and after converting this to lists of python dictionaries, NaN
becomes nan
>> d = df.to_dict(orient='records')
[{'X': 'A', 'Y': nan}, {'X': 'B', 'Y': nan}, {'X': 'C', 'Y': 'D'}]
I'm trying to find where null exits using math.isnan()
, but it throws an exception
for i,v in enumerate(d):
if math.isnan(v['Y']):
print(I)
0
1
TypeError: must be real number, not str
Exception can be handle using
for i,v in enumerate(d):
try:
if math.isnan(v['Y']):
print(i)
except:
pass
But is there a better way to find nan
values??
CodePudding user response:
IIUC use pandas.isna
:
for i,v in enumerate(d):
if pd.isna(v['Y']):
print('I')
CodePudding user response:
You don't need any function, you can compare the value to itself. NaNs have this interesting property to be different from themselves:
for i,v in enumerate(d):
if v['Y']!=v['Y']:
print(i)
output
0
1
Another possibility would be to generate this list from the DataFrame itself (note that you would get the real indices here, so 0, 1 if this is a range index otherwise the first and second values of the index):
s = df['Y'].isna()
na_indices = s[s].index.to_list()
output: [0, 1]