Appending all values to a double nested dictionary-CodePudding

I have a double nested dictionary that I would like to write more information to but when I try to write more information it overwrites everything.

data= {'First Name': ['Sally', 'Bob', 'Sue', 'Tom', 'Will'],
     'Last Name': ['William', '', 'Wright', 'Smith','Thomas'],
     'Email Address':['[email protected]','[email protected]','[email protected]','[email protected]',''],
     'Industry': ['Automotive','Gas', 'Healthcare', 'Other', 'Biotech / Pharma'],
     'SME Vertical': ['Education', 'hotels', '', 'project management and design',''],
     'System Type': ['Access','Access','video Systems','Access','Access'],
     'Account Type': ['Commercial', '','Reseller','','Small']}
 
df=pd.DataFrame(data)
df1= df[["Industry",'System Type','Account Type', 'SME Vertical']]
req_cols=  ['First Name','Last Name','Email Address']
 
errors= {}
errors[filename]={}

Here is my code for my first loop that reads an external JSON file and returns all the errors to the dictionary 'errors', this part works great.

mask = df1.apply(lambda c: c.isin(valid[c.name]))
df1.mask(mask|df1.eq(' ')).stack()
for err_i, (r, v) in enumerate(df1.mask(mask|df1.eq(' ')).stack().iteritems()):
    errors[filename][err_i] = {"row": r[0],
                               "column": r[1],
                               "message": v   " is invalid"}

Its output looks something like this:

     key            Type     Size                    Value
Data Template       dict       6      {'row': 1, 'column': 'Industry', 'message': 'gas is invalid'}
                                      {'row': 1, 'column': 'SME Vertical', 'message': 'hotels is invalid'}
                                      {'row': 2, 'column': 'Industry', 'message': 'healthcare is invalid'}
                                      {'row': 3, 'column': 'Industry', 'message': 'other is invalid'}
                                      {'row': 3, 'column': 'SME Vertical', 'message': 'project management and design is invalid'}
                                      {'row': 4, 'column': 'Account Type', 'message': 'small is invalid'}

I would like to add this piece of code to write more errors into that nest dictionary above. The code finds the Nan and blanks in the req_cols:

bad_nan = df.loc[df[req_cols].isna().any(1)]
bad_nan=bad_nan.fillna(value='NaN' )
for col in bad_nan.columns:
   for i in bad_nan.index:
      if bad_nan.loc[i, col] == 'NaN':
         errors[filename]={ "row": i,
                 "column": col,
                 "message": "This is a required field" }

its output overwrites all the existing data in the nested dictionary. How do I just add more to the nested dictionary? I would like to add all the invalid format and require field errors to the same dictionary so it looks something like:

key            Type     Size                    Value
Data Template   dict       6          {'row': 1, 'column': 'Industry', 'message': 'gas is invalid'}
                                      {'row': 1, 'column': 'SME Vertical', 'message': 'hotels is invalid'}
                                      {'row': 2, 'column': 'Industry', 'message': 'healthcare is invalid'}
                                      {'row': 3, 'column': 'Industry', 'message': 'other is invalid'}
                                      {'row': 3, 'column': 'SME Vertical', 'message': 'project management and design is invalid'}
                                      {'row': 4, 'column': 'Account Type', 'message': 'small is invalid'}
                                      {'row': 2, 'column' : 'Last Name', 'message': 'this is a required field'}
                                      {'row': 5, 'column' : 'Email Address', 'message': 'this is a required field'}

CodePudding user response：

Your printouts only showed the values in your errors dictionary, I think. When I print using

print('\n'.join(map(str, errors[filename].items())))

I see that the dictionary is keyed by error number. For example:

(0, {'row': 0, 'column': 'SME Vertical', 'message': 'Education is invalid'})
(1, {'row': 1, 'column': 'Industry', 'message': 'Gas is invalid'})
(2, {'row': 1, 'column': 'Account Type', 'message': ' is invalid'})
...

This makes sense to me based on the code in your first loop:

    errors[filename][err_i] = {"row": r[0],
                               "column": r[1],
                               "message": v   " is invalid"}

Note the use of the key, err_i.

I submit you want your second loop to use an error index as well, to assign values to errors[filename][err_i] rather than errors[filename]. Perhaps something like this:

      if bad_nan.loc[i, col] == 'NaN':                                          
         errors[filename][err_i]={ "row": i,                                    
                 "column": col,                                                 
                 "message": "This is a required field" }                        
         err_i  = 1