I want to create a function that write an error state to an error dictionary. this dictionary will be outputted as a JSON for corrections in the real dataset. The error statement must include the row number the column name and a simple sentence.
import pandas as pd
import numpy as np
data=[[np.nan, 'Indiana','[email protected]']]
df=pd.DataFrame(data,columns=['Name','State','Email'])
req_dict={"Name","Email"}
errors={}
Use errors as the error dictionary to write to
I have tried this but it is not reading the row number right and it is also not able to add to the error dictionary instead it overwrites the data that has been previously added.
def req_cols (df,req_dict,errors):
for c in req_dict:
for i in df.index:
if df[c].isna().any():
errors={ "row": i,
"column": c,
"message": "This is a required field, fill in " c " accordingly" }
return errors
I expect the output to be
{ "row": 0, "column": Name,
"message": "This is a required field, fill in "Name " accordingly" }
How do I create an error logging dictionary to append each new error to, that has the row location and column name of the error value?
CodePudding user response:
You could first get a list with index&column of NaN
in required fields, then build an error message for each of them cells in a list comprehension.
Input
data=[[np.nan, 'Indiana','[email protected]'], ['Ben', 'Alaska','[email protected]'], ['Alan', 'Florida', np.nan]]
df=pd.DataFrame(data,columns=['Name','State','Email'])
print(df)
req_fields={"Name","Email"} # btw, this is a set, not a dict
Name State Email
0 NaN Indiana [email protected]
1 Ben Alaska [email protected]
2 Alan Florida NaN
EDIT
The corrected version of your attempt:
You still need to know at which specific index/column position there is a nan, otherwise you create an error message for each elem in a column only because there is one nan in the whole column.
def req_cols (df,req_dict):
lst_of_errors = []
for c in req_dict:
for i in df.index:
if pd.isna(df.at[i,c]):
errors={ "row": i,
"column": c,
"message": f'This is a required field, fill in "{c}" accordingly'}
lst_of_errors.append(errors)
return lst_of_errors
print(req_cols(df,req_fields))
I used f-strings for creating the string in the error message of your dict. For more details, see the official documentation.
Old:
My solution for the task:
Search for NaN and get a list with index/column:
mask = pd.isna(df[list(req_fields)]).stack()
all_nan_fields = mask.loc[mask].index.tolist()
print(all_nan_fields)
[(0, 'Name'), (2, 'Email')]
Use this to build your error messages:
list_with_errors = [
{"row": elem[0],
"column": elem[1],
"message": f"This is a required field, fill in '{elem[1]}' accordingly"}
for elem in all_nan_fields
]
print(list_with_errors)
[{'row': 0,
'column': 'Name',
'message': "This is a required field, fill in 'Name' accordingly"},
{'row': 2,
'column': 'Email',
'message': "This is a required field, fill in 'Email' accordingly"}]