I am familiar with running data transformation using python on csv file format but new to running data transformation on Json format.
I have a Json streaming data and I want to apply md5 algorithm to generate hash. I have created the function responsible for generating md5 hash, but I do not know how to apply this function to the json data
Here is my md5 hash generation script
import hashlib
# initializing string
Password= "kureen2022!"
AccountName = "[email protected]"
# Convert password to bytes
pw_byte = Password.encode()
# Generate Salt
salt = AccountName.lower().encode()
# Concatenate salt with password, apply md5
result = hashlib.md5(salt pw_byte)
pw_hash = result.hexdigest()
# printing Hash value.
print(pw_hash)
here is my Json script
import json
f = open('my_json.Json')
data = json.load(f)
for i in data['details']:
print(i)
here is a sample json data
{
"details" : [
{
"AccountName": "dojoujre",
"Password": "password123"
},
{
"AccountName": "dojoujre",
"Password": "password007"
}
]
}
The objective is to apply the md5 script to the json file generate a hash
{
"details" : [
{
"AccountName": "dojoujre",
"Password": "password123",
"Hash": "93837373930"
},
{
"AccountName": "dojoujre",
"Password": "password007",
"Hash": "eer3er5t6t6y"
}
]
}
would be glad for some direction. Also if anyone can direct me to the right materials where I can learn how to perform data transformation with Json
CodePudding user response:
You can access it like a dict and iterate over the list of dicts.
Access item with :
data.get('details') # or
data['details']
You can write a new entry with:
data['result'] = xxxx
I think that is what you are asking for, isn't?
CodePudding user response:
In Python, JSON objects are handled just like dictionaries. Hence, your code shall be something similar to:
for x in data['details']:
x['Hash'] = compute_md5(x['Password'])
Here is the official doc about dictionaries. Check out also this doc which has some useful functionalities to serialize/deserialize jsons.
A short notice about md5 hashing: I'd discourage you to use md5 to compute "security" hashes in a production and/or critical environment. Md5 algorithm is basically broken and if I recall correctly, a collision can be found with only 2^11 operations, which is very easy with modern computers. You should rely on much more secure algorithms such as SHA-3.