I have a dictionary
returned by a response in AWS Lambdas
which looks like below:
data = {
"id": 1,
"name": "Hey",
"category": null,
"salary": 1222,
"Roles": [
{
"attributes1": {
"key": "sum_assured",
"value": 192000
},
"attributes2": {
"key": "cash_value",
"value": 48576
},
"attributes3": {
"key": "expiry_date",
"value": null
}
}
]
}
Now, I want to format a few numeric columns like salary
to 1222.00
and convert other non-string
columns to strings for "ID"
I should get "1"
as a string and not
an Int
. Here, null is the actual NaN
returned by the response.
Below is the code which I tried:
new_dict = {}
for i, j in data.items():
print(i)
if (i == "salary"):
new_dict[i] = "%0.2f" % j
if isinstance(j, str):
new_dict[i] = j
elif j is None:
new_dict[i] = ""
else:
new_dict[i] = str(j)
I know that I could maintain a list for such amount related columns and then I can do some kind of comparisons and replace it. When I test the value for salary inside the loop it gives me decimal places but in the response, it doesn't show up.
Output
:
{
"id": "1",
"name": "Hey",
"category": "",
"salary": "1222",
"Roles": "[{'attributes4': {'key': 'sum_assured', 'value': 192000}, 'attributes5': {'key': 'cash_value', 'value': 48576}, 'attributes6': {'key': 'expiry_date', 'value': 'null'}}]"
}
Expected:
{
"id": "1",
"name": "Hey",
"category": "",
"salary": "1222",
"Roles": "[{"attributes4": {"key": "sum_assured", "value": "192000.00"}, "attributes5": {"key": "cash_value", "value": "48576.00"}, "attributes6": {"key": "expiry_date", "value": ""}}]"
}
Also, how could I format the records of the Roles
array with some kind of indent in it? I wasn't able to do that.
Update 1
:
I tried the below code for accessing the nested attributes as per my JSON
but it gives me a key error
:
for i, j in formatted_json.items():
print(type(j))
if isinstance(j, list):
for k in j:
for key, value in k.items():
for l, m in value.items():
if (m == "salary" or m == 'value'):
new_dict[key][l] = "%0.2f" % m
elif isinstance(m, str):
new_dict[key][l] = m
elif m is None:
new_dict[key][l] = ""
else:
new_dict[key][l] = str(m)
elif (i == "salary" or i == 'value'):
new_dict[i] = "%0.2f" % j
elif isinstance(j, str):
new_dict[i] = j
elif j is None:
new_dict[i] = ""
else:
new_dict[i] = str(j)
Traceback (most recent call last):
File "/home/.../Downloads/pyspark/src/sample/json_to_string.py", line 392, in <module>
new_dict[key][l] = m
KeyError: 'attributes4'
Update 2
:
The actual dataset is as below:
json_data = '''{
"policy_no": "",
"paid_to_date": "2016-09-28",
"proposal_no": null,
"agent_code": "1234",
"policy_status": "Death",
"policy_term": "15",
"policy_premium_payment_term": "7",
"policy_effective_date": "2013-09-28",
"last_premium_paid": 13790.85,
"last_premium_paid_date": "2016-06-17",
"pivc_flag_status": null,
"product_code": "",
"product_name": "Monthly Income Plan",
"application_date": "2013-09-16",
"underwriting_decesion_date": "2013-09-26",
"issuance_date": "2013-09-30",
"ols_policy": "N",
"next_premium_due_date": "2016-09-28",
"policy_year": null,
"premium_holiday_from_date": null,
"premium_holiday_to_date": null,
"product_type": "Non-Linked",
"proposer_client_code": "",
"life_insured_client_code": "",
"ppt_completion_date": "2020-09-28",
"ppt_left": 4,
"auto_termination_date": null,
"fund_value": null,
"loan_amt": 0,
"welcome_kit_current_status": null,
"dispatch_date": "2013-10-25",
"status_date": "2013-10-25",
"updated_on": "2013-10-26 19:51:51.983",
"surrender_value": null,
"loan_eligiability": null,
"courier_name": null,
"awb_no": "",
"status_id": 5,
"claim_registration_date": "2021-12-15",
"claim_approval_reject_date": "2021-12-15",
"claim_requirement_document": "HEART ATTACK",
"claim_requirement_status": "P",
"requirement_raised_on": "2021-12-15",
"requirement_closed_on": "2021-12-15",
"claim_status": "P",
"status_for_death_claims": "P",
"date_of_transaction": "2021-12-15",
"ecs_si_registration_date": null,
"ecs_si_status": "inactive",
"ecs_si_failure_reasons": null,
"ecs_si_status_as_per_rls": null,
"ecs_day": "0",
"payment_mode": "S",
"payment_mode_desc": "Semi Anually",
"payment_method": "S",
"payment_method_desc": "Cash/Cheque",
"payment_date": "2021-12-15",
"payment_amount": 13790.85,
"premium_base_amt": null,
"premium_tax_amt": null,
"due_premium": null,
"total_premium": 13790.85,
"advance_premium": null,
"assignee": [
{
"assignee_name": "",
"attributes1": {
"key": "assignee_client_id",
"value": ""
},
"attributes2": {
"key": "assignee_type",
"value": ""
}
}
],
"agents": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"agent_name": "",
"attributes2": {
"key": "agent_code",
"value": "1234"
},
"attributes3": {
"key": "email",
"value": ""
},
"attributes4": {
"key": "channel",
"value": "AGENCY"
},
"attributes5": {
"key": "subchannel",
"value": null
},
"attributes6": {
"key": "branch_name",
"value": ""
},
"attributes7": {
"key": "contact_no",
"value": ""
},
"attributes8": {
"key": "status",
"value": "T"
}
}
],
"bank_accounts": {
"attributes1": {
"key": "accountnumber",
"value": ""
},
"attributes2": {
"key": "accountholdername",
"value": ""
},
"attributes3": {
"key": "ifsccode",
"value": ""
},
"attributes4": {
"key": "micrcode",
"value": ""
},
"bankname": "",
"attributes5": {
"key": "branchname",
"value": ""
},
"attributes6": {
"key": "bank_address",
"value": ""
}
},
"Funds": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "allocation_mode",
"value": "percentage"
},
"attributes3": {
"key": "allocation_value",
"value": null
},
"attributes4": {
"key": "fund_code",
"value": null
},
"fund_name": null,
"investment_strategy_name": null
}
],
"Investment_Strategies": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "allocation_mode",
"value": "percentage"
},
"attributes3": {
"key": "allocation_value",
"value": null
},
"attributes4": {
"key": "fund_code",
"value": null
},
"fund_name": null,
"investment_strategy_name": null
}
],
"Roles": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "beneficiary_percentage",
"value": "100.00"
},
"attributes3": {
"key": "relationship",
"value": "SPOUSE"
},
"role_name": "Nominee",
"attributes4": {
"key": "age",
"value": "NaN"
},
"attributes5": {
"key": "party_name",
"value": ""
},
"attributes6": {
"key": "gender",
"value": "F"
},
"attributes7": {
"key": "date_of_birth",
"value": "1953-03-20"
},
"attributes8": {
"key": "email",
"value": null
},
"attributes9": {
"key": "registred_mobile",
"value": null
}
},
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "beneficiary_percentage",
"value": null
},
"attributes3": {
"key": "relationship",
"value": null
},
"role_name": "Insured",
"attributes4": {
"key": "age",
"value": 63
},
"attributes5": {
"key": "party_name",
"value": ""
},
"attributes6": {
"key": "gender",
"value": "M"
},
"attributes7": {
"key": "date_of_birth",
"value": "1950-06-20"
},
"attributes8": {
"key": "email",
"value": null
},
"attributes9": {
"key": "registred_mobile",
"value": null
}
},
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "beneficiary_percentage",
"value": null
},
"attributes3": {
"key": "relationship",
"value": null
},
"role_name": "Owner",
"attributes4": {
"key": "age",
"value": 63
},
"attributes5": {
"key": "party_name",
"value": ""
},
"attributes6": {
"key": "gender",
"value": "M"
},
"attributes7": {
"key": "date_of_birth",
"value": "1950-06-20"
},
"attributes8": {
"key": "email",
"value": ""
},
"attributes9": {
"key": "registred_mobile",
"value": ""
}
}
],
"Benefits": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"benefit_name": "Miii",
"attributes2": {
"key": "benefit_term",
"value": "7"
},
"attributes3": {
"key": "modal_premium",
"value": null
},
"attributes4": {
"key": "sum_assured",
"value": 192000
},
"attributes5": {
"key": "cash_value",
"value": 48576
},
"attributes6": {
"key": "expiry_date",
"value": null
},
"attributes7": {
"key": "status",
"value": "Death"
},
"attributes8": {
"key": "benefit_code",
"value": ""
},
"attributes9": {
"key": "benefit_type",
"value": "base"
}
}
]
}
'''
CodePudding user response:
When you have a strong model in your data, and you want to process/validate the data you receive, its better to be done in a object oriented fashion. If you can consider your data value as an object with all logic written inside for validating or correcting, it becomes simple.
See an example approach below:
(Assumed your null
as None
)
import json
# Your data object..
data = '''{
"id": 1,
"name": "Hey",
"category": null,
"salary": 1222,
"Roles": [
{
"attributes1": {
"key": "sum_assured",
"value": 192000
},
"attributes2": {
"key": "cash_value",
"value": 48576
},
"attributes3": {
"key": "expiry_date",
"value": null
}
}
]
}'''
# Create simple object models to represent the data.
class Role:
def __init__(self, key, value):
self.key = key
self.value = value or "" # fallback to empty string
class Policy:
def __init__(self, id, name, category, salary, Roles):
# write all your pre/post processing here
self.id = id
self.name = name
self.category = category or ""
self.salary = format(salary, '0.2f') if salary else ""
self.Roles = [Role(**role) for role in Roles[0].values()] if Roles else []
# a json serialization helper function
def serial_default(obj):
if isinstance(obj, (Policy, Role)):
return obj.__dict__
else:
return obj
# You just create the object with your data. Now all data is in the object.
data = json.loads(data)
p = Policy(**data)
# Serialize the object to json in the proper format
print(json.dumps(p, default=serial_default))
The print generates a valid version of your object as JSON like
{"id": 1, "name": "Hey", "category": "", "salary": "1222.00", "Roles": [{"key": "sum_assured", "value": 192000}, {"key": "cash_value", "value": 48576}, {"key": "expiry_date", "value": ""}]}
This is just an example to the approach. You can modify or improvise a similar approach and make things simpler.
CodePudding user response:
Firstly i have to agree with Kris answer, this kind of conversion looks now a bit ugly and with every added special case Object-Oriented-style makes more sense.
A few points:
- null, None, ""
It seems that null won't be seen as None, so your not getting into the if clause. Check JSON to Dict for that
How can JSON data with null value be converted to a dictionary
- Salary
According to your question you would expect "salary": "1222.00" correct? If so you are currently overwriting your previously formatted number with
if isinstance(j, str):
new_dict[i] = j
Just change it to elif so you wont overwrite your conversion.
- Roles
The objects within the array of Roles are currently not iterated over. Hence the number are not being formatted. For that you would need another loop on the Roles array.
Following a recursion attempt to cover multiple nested collections.
import json
from collections import abc
dataRaw = '''
{
"id": 1,
"name": "Hey",
"category": null,
"salary": 1222,
"Roles": [
{
"attributes1": {
"key": "sum_assured",
"value": 192000
},
"attributes2": {
"key": "cash_value",
"value": 48576
},
"attributes3": {
"key": "expiry_date",
"value": null
}
}
]
}
'''
data = json.loads(dataRaw)
def processNested(nested):
if (isinstance(nested, list)):
for index in range(len(nested)):
nested[index] = processNested(nested[index])
else:
for key, value in nested.items():
if isinstance(value, abc.Mapping):
if "key" in value and "value" in value:
if (value["key"] == "cash_value"):
nested[key]["value"] = float(value["value"])
else:
nested[key] = processNested(value)
else:
if (key == "value" and value != None):
nested[key] = "%0.2f" % value
return nested
new_dict = data
for i, j in new_dict.items():
if (i == "salary"):
new_dict[i] = "%0.2f" % j
elif isinstance(j, str):
new_dict[i] = j
elif j is None:
new_dict[i] = ""
elif i == "Roles":
new_dict[i] = processNested(j)
else:
new_dict[i] = str(j)
print(new_dict)