How to remove duplicate entries from a JSON file using python?
I have a JSON file that looks like this:
appreciate some one can help to provide a solution for fixing it
json_data = [
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9099",
"aks9099",
"aks9098",
"aks9100",
"aks9100",
"aks9101",
"aks9102",
"aks9103",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
},
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9099",
"aks9098",
"aks9098",
"aks9100",
"aks9101",
"aks9102",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
},
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9099",
"aks9098",
"aks9100",
"aks9100",
"aks9101",
"aks9102",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
}
]
I would like to remove duplicate entries from the list and expected result should be looks like this:
Appreciate you can help to provide a solution for fixing it
json_data = [
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9099",
"aks9098",
"aks9100",
"aks9101",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
},
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9099",
"aks9098",
"aks9100",
"aks9101",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
},
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9099",
"aks9098",
"aks9100",
"aks9101",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
}
]
CodePudding user response:
Does the following solve your problem?
new_list=[]
for i in json_data:
if not i in new_list:
new_list.append(i)
print(new_list)
CodePudding user response:
Even though OP asked to do this in Python, this can readily be done in jq
using function unique
with a single update assignment:
$ jq '.[].permissions[].collections |= unique' json.txt
[
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9098",
"aks9099",
"aks9100",
"aks9101",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
},
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9098",
"aks9099",
"aks9100",
"aks9101",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
},
{
"authType": "ldap",
"password": "",
"permissions": [
{
"collections": [
"aks9098",
"aks9099",
"aks9100",
"aks9101",
"aks9102",
"aks9103"
],
"project": "Central Project"
}
],
"role": "devSecOps",
"username": "[email protected]"
}
]
To invoke this in Python, one could do this:
import subprocess
return_obj = subprocess.run(["jq", ".[].permissions[].collections |= unique","json.txt"], stdout=subprocess.PIPE)
json_data = return_obj.stdout.decode()