I am trying to loop through a subset of my dataframe to find all the Nan values and print the column name and row location to a dictionary.
the output should look something like this:
{'row': 2, 'column': First Name*, 'message': 'This is a required field'}
Here is the code I have so far to achieve this:
errors=[]
req_cols = ['First Name*','Last Name*','Country*','Company*','Email Address*']
bad_nan = df.loc[df[req_cols].isna().any(1)]
for col in bad_nan.columns:
bad_nan[col] = bad_nan[col].astype('str')
for i in range(bad_nan.shape[0]):
if bad_nan.loc[i, col] == 'nan':
errors.append({ "row": i,
"column": col,
"message": "This is a required field" })
I have tried to replace == 'nan' with =='np.nan' and I still get a keyerror. It is showing me that the keyerror is found in the section of code below
if bad_nan.loc[i, col] == 'nan':
I am really stuck on why I am getting a keyerror: 0 here any help would be appreciated.
CodePudding user response:
You were getting error because there was no row with index value 0 in the dataframe bad_nan
. What we can do is instead loop through the index values itself. Also use np.NaN
for filtering blank values.
import numpy as np
for col in bad_nan.columns:
bad_nan[col] = bad_nan[col].astype('str')
for i in bad_nan.index:
if bad_nan.loc[i, col] == np.NaN:
errors.append({ "row": i,
"column": col,
"message": "This is a required field" })