Here I am comparing a data frame to a list of standard values (seen below). Instead of the print statement I would like it convert it to a dictionary. Here is the code I have so far:
valid= {'Industry': ['Automotive', 'Banking / Finance','Biotech / Pharma','Commercial Buildings','Construction / Distribution',
'Consumer Products','Education','Education - K-12','Education - University / Higher','Entertainment / Media','Financial',
'Food & Beverage','Gas','Government','Government - Federal','Government - State / Local','Healthcare','High Security',
'Hospitality / Entertainment','Manufacturing / Communications','Other','Petrochem / Energy',
'Property Management / Real Estate','Public Facility / Non-Profit','Residential','Restaurant','Retail','Services - B2B',
'Technology','Telecom / Utilities','Transportation','Utilities','Food Retail','Specialized Retail','IT','Corrections',
'Core Commercial (SME)'],
'SME Vertical': ['Agriculture, Food and Manufacturing','Architectural services','Arts, entertainment and recreation','Automobile',
'Chemistry / Pharmacy','Construction','Education','Hotels','Offices','Other Industries','Other Services',
'Project management and design','Real Estate and promotion','Restaurants, Café and Bars',
'Energy, Infrastructure, Environment and Mining','Financial and Insurance Services',
'Human health and social work activities','Professional, scientific, technical and communication activities',
'Public administration and defence, compulsory social security','Retail/Wholesale','Transport, Logistics and Storage'],
'System Type': ['Access','Access Control','Alarm Systems','Asset Tracking','Banking','Commander','EAS','Financial products','Fire',
'Fire Alarm','Integrated Solution','Intercom','Intercom systems','Intrusion - Traditional','Locking devices & Systems',
'Locks & Safes','Paging','Personal Safety','Retail & EAS Products','SaaS','SATS','Services',
'Sonitrol Integrated Solution','Sonitrol - Integrated Solution','Sonitrol - Managed Access',
'Sonitrol - Verified Audio Intrusion','Time & Attendance','TV-Distribution','Unknown','Video','Video Systems'],
'Account Type': ['Commercial','International','National','Regional','Reseller','Residential','Small']}
mask = df1.apply(lambda c: c.isin(valid[c.name]))
df1.mask(mask|df1.eq(' ')).stack()
for r, v in df1.mask(mask|df1.eq(' ')).stack().iteritems():
print(f'error found in row "{r[0]}", column "{r[1]}": "{v}" is invalid')
Here is the current output of the print statements
error found in row "1", column "Industry": "gas" is invalid
error found in row "1", column "SME Vertical": "hotels" is invalid
error found in row "2", column "Industry": "healthcare" is invalid
error found in row "3", column "Industry": "other" is invalid
error found in row "3", column "SME Vertical": "project management and design" is invalid
error found in row "4", column "Account Type": "small" is invalid
This output is good in terms of the format but I can’t get it to write to a dictionary.
Example output from the dictionary:
{row “1”: column: "Industry", message: "gas" is invalid, .... etc}
CodePudding user response:
This is straightforward, but YOU need to decide what the format will be. What you have shown above is not a valid dictionary.
Maybe like this, as a list of dictionaries, one for each error?
errors = []
for r, v in df1.mask(mask|df1.eq(' ')).stack().iteritems():
errors.append({
"row": r[0],
"column": r[1],
"message": v " is invalid"
})
CodePudding user response:
How about something like this (example)?
Code
d = {}
d['error found in row "1", column "Industry"'] = []
d['error found in row "1", column "Industry"'].append('"gas" is invalid')
d['error found in row "1", column "Industry"'].append('"hotels" is invalid')
print(json.dumps(d, indent=4))
Output
$ python test.py
{
"error found in row \"1\", column \"Industry\"": [
"\"gas\" is invalid",
"\"hotels\" is invalid"
]
}