I am trying to find all the nan values in certain columns and then print a statement saying that it found nan entries in those columns.
import pandas as pd
import numpy as np
data=[[np.nan, 'Indiana','[email protected]']]
df=pd.DataFrame(data,columns=['Name','State','Email'])
req_dict={"Name","Email"}
here is the sample data frame and notice how Name and Email are required but State is NOT this is because not all columns are required to have values in them
I have tried to write a function to do this but it is not working as intended
def req_cols(df,req_dict):
for d in req_dict:
for i in df.index():
if df.loc[i,d]== pd.notnull():
print('a blank was found in' d)
return
I understand a function is overkill for this but makes sense in the actual project.
I expect to get a print statement saying "a blank was found in Name"
How do I create a function to find blanks in a df by using the column names in a separate dictionary
CodePudding user response:
Try using .isna()
.any()
:
for c in req_dict:
if df[c].isna().any():
print("a blank was found in", c)
Prints:
a blank was found in Name
Complete example:
data = [[np.nan, "Indiana", "[email protected]"]]
df = pd.DataFrame(data, columns=["Name", "State", "Email"])
req_dict = {"Name", "Email"}
def check_nan_columns(df, cols):
out = []
for c in cols:
if df[c].isna().any():
out.append(c)
return out
for c in check_nan_columns(df, req_dict):
print("a blank was found in", c)