Home > database >  Writing an error statement to a dictionary using a function
Writing an error statement to a dictionary using a function

Time:11-03

I want to create a function that write an error state to an error dictionary. this dictionary will be outputted as a JSON for corrections in the real dataset. The error statement must include the row number the column name and a simple sentence.

import pandas as pd
import numpy as np

data=[[np.nan, 'Indiana','[email protected]']]
df=pd.DataFrame(data,columns=['Name','State','Email'])

req_dict={"Name","Email"}

errors={}

Use errors as the error dictionary to write to

I have tried this but it is not reading the row number right and it is also not able to add to the error dictionary instead it overwrites the data that has been previously added.

def req_cols (df,req_dict,errors):
    for c in req_dict:
        for i in df.index:
            if df[c].isna().any():
                errors={ "row": i,                                    
                     "column": c,                                                 
                     "message": "This is a required field, fill in " c  " accordingly" }                        
    return errors

I expect the output to be

{ "row": 0, "column": Name,                                                 
  "message": "This is a required field, fill in "Name " accordingly" }  

How do I create an error logging dictionary to append each new error to, that has the row location and column name of the error value?

CodePudding user response:

You could first get a list with index&column of NaN in required fields, then build an error message for each of them cells in a list comprehension.

Input

data=[[np.nan, 'Indiana','[email protected]'], ['Ben', 'Alaska','[email protected]'], ['Alan', 'Florida', np.nan]]
df=pd.DataFrame(data,columns=['Name','State','Email'])
print(df)
req_fields={"Name","Email"} # btw, this is a set, not a dict
   Name    State           Email
0   NaN  Indiana  [email protected]
1   Ben   Alaska  [email protected]
2  Alan  Florida             NaN

EDIT
The corrected version of your attempt: You still need to know at which specific index/column position there is a nan, otherwise you create an error message for each elem in a column only because there is one nan in the whole column.

def req_cols (df,req_dict):
    lst_of_errors = []
    for c in req_dict:
        for i in df.index:
            if pd.isna(df.at[i,c]):
                errors={ "row": i,                                    
                     "column": c,                                                 
                     "message": f'This is a required field, fill in "{c}" accordingly'}
                lst_of_errors.append(errors)
    return lst_of_errors

print(req_cols(df,req_fields))

I used f-strings for creating the string in the error message of your dict. For more details, see the official documentation.

Old:
My solution for the task:

Search for NaN and get a list with index/column:

mask = pd.isna(df[list(req_fields)]).stack()
all_nan_fields = mask.loc[mask].index.tolist()
print(all_nan_fields)
[(0, 'Name'), (2, 'Email')] 

Use this to build your error messages:

list_with_errors = [
    {"row": elem[0], 
     "column": elem[1],
     "message": f"This is a required field, fill in '{elem[1]}' accordingly"}
    for elem in all_nan_fields
]
print(list_with_errors)
[{'row': 0,
  'column': 'Name',
  'message': "This is a required field, fill in 'Name' accordingly"},
 {'row': 2,
  'column': 'Email',
  'message': "This is a required field, fill in 'Email' accordingly"}]
  • Related