Home > database >  Turning a list of dictionaries into a double nested dictionary
Turning a list of dictionaries into a double nested dictionary

Time:08-24

I have a list that has multiple dictionary entries in it but I want to turn that list of dictionaries into a double nested dictionary. I would also like to store all entries and not overwrite anything.

Sample df:

data= {'First Name': ['Sally', 'Bob', 'Sue', 'Tom', 'Will'],
     'Last Name': ['William', '', 'Wright', 'Smith','Thomas'],
     'Indsutry': ['Automotive','Gas', 'Healthcare', 'Other', 'Biotech / Pharma'],
     'SME Vertical': ['Education', 'hotels', '', 'project management and design',''],
     'System Type': ['Access','Access','video Systems','Access','Access'],
     'Account Type': ['Commercial', '','Reseller','','Small']}

df=pd.DataFrame(data)
df1= df[["Industry",'System Type','Account Type', 'SME Vertical']]
errors= {}
filename= os.path.basename(r'sample\path\to\file.csv')

valid= {'Industry': ['Automotive', 'Banking / Finance','Biotech / Pharma','Commercial Buildings','Construction / Distribution',
                  'Consumer Products','Education','Education - K-12','Education - University / Higher','Entertainment / Media','Financial',
                  'Food & Beverage','Gas','Government','Government - Federal','Government - State / Local','Healthcare','High Security',
                  'Hospitality / Entertainment','Manufacturing / Communications','Other','Petrochem / Energy',
                  'Property Management / Real Estate','Public Facility / Non-Profit','Residential','Restaurant','Retail','Services - B2B',
                  'Technology','Telecom / Utilities','Transportation','Utilities','Food Retail','Specialized Retail','IT','Corrections',
                  'Core Commercial (SME)'],
        'SME Vertical': ['Agriculture, Food and Manufacturing','Architectural services','Arts, entertainment and recreation','Automobile',
                'Chemistry / Pharmacy','Construction','Education','Hotels','Offices','Other Industries','Other Services',
                'Project management and design','Real Estate and promotion','Restaurants, Café and Bars',
                'Energy, Infrastructure, Environment and Mining','Financial and Insurance Services',
                'Human health and social work activities','Professional, scientific, technical and communication activities',
                'Public administration and defence, compulsory social security','Retail/Wholesale','Transport, Logistics and Storage'],
        'System Type': ['Access','Access Control','Alarm Systems','Asset Tracking','Banking','Commander','EAS','Financial products','Fire',
                    'Fire Alarm','Integrated Solution','Intercom','Intercom systems','Intrusion - Traditional','Locking devices & Systems',
                    'Locks & Safes','Paging','Personal Safety','Retail & EAS Products','SaaS','SATS','Services',
                    'Sonitrol Integrated Solution','Sonitrol - Integrated Solution','Sonitrol - Managed Access',
                    'Sonitrol - Verified Audio Intrusion','Time & Attendance','TV-Distribution','Unknown','Video','Video Systems'],
        'Account Type': ['Commercial','International','National','Regional','Reseller','Residential','Small']}

mask = df1.apply(lambda c: c.isin(valid[c.name]))
df1.mask(mask|df1.eq(' ')).stack()

for r, v in df1.mask(mask|df1.eq(' ')).stack().iteritems():
    errors[filename]={
    "row": r[0],
    "column": r[1],
    "message": v   " is invalid"
}

output:

[{'row': 1, 'column': 'Industry', 'message': 'gas is invalid'}, {'row': 1, 'column': 'SME Vertical', 'message': 'hotels is invalid'}, {'row': 2, 'column': 'Industry', 'message': 'healthcare is invalid'}, {'row': 3, 'column': 'Industry', 'message': 'other is invalid'}, {'row': 3, 'column': 'SME Vertical', 'message': 'project management and design is invalid'}, {'row': 4, 'column': 'Account Type', 'message': 'small is invalid'}]

ideal output:

errors={file name:{'row': 1, 'column': 'Industry', 'message': 'gas is invalid'}, ... etc

I have tried:

errors = {{}}

mask = df1.apply(lambda c: c.isin(valid[c.name]))
df1.mask(mask|df1.eq(' ')).stack()

for r, v in df1.mask(mask|df1.eq(' ')).stack().iteritems():
    errors.update({
    "row": r[0],
    "column": r[1],
    "message": v   " is invalid"
})

but I get an error

any ideas or help are appreciated

CodePudding user response:

What you have to do is create a new dictionary within errors. I cannot see in your example where you get the file name (or key for the main dictionary) so I cannot give your a fully working example but it should be something like this:

errors = {}

filename = 'SUPER AWESOME FILE' #  or where ever you get the filename key you wanted

errors[filename] = {}

for err_i, (r, v) in enumerate(df1.mask(mask|df1.eq(' ')).stack().iteritems()):

    errors[filename][err_i] = {"row": r[0],
                               "column": r[1],
                               "message": v   " is invalid"}

Note: if filename is not a variable change it to a string as suggested in the comments by Ignatius Reilly

  • Related