Home > Net >  Python: If I have a list of dicts{person_id,list,account}, how can I remove duplicate person_id and
Python: If I have a list of dicts{person_id,list,account}, how can I remove duplicate person_id and

Time:12-28

Here's the gist, I'm using Django to fill a PostgreSQL database to store user data from a third-party API. I'm using an API to get the data into Django so that I can automate the filling of the DB. I have the models built for the fields that need stored.

Here's where I need some help. I've create a list from an API response but I want to remove duplicate users and combine the lists, like this.

What I have now.
    {
        "person_id": "1",
        "account": "5",
        "list": "c"
    },
    {
        "person_id": "1",
        "account": "5",
        "list": "b"
    },
    {
        "person_id": "1",
        "account": "5",
        "list": "a"
    },
...
What i want
    {
        "person_id": "1",
        "account": "5",
        "list": ["a","b","c"]
    },
    {
        "person_id": "2",
        "account": "5",
        "list": ["a","c"]
    },
    {
        "person_id": "3",
        "account": "5"
        "list": ["a","b"]
    },
...

one API call I'm making is to get all users in a list and responds with:

API RESPONSE
{
    "records": [
        {
            "id": "asdafdgsdfhsdfh",
            "email": "[email protected]",
            "phone_number": " 1123123"
        },
        {
            "id": "asdafdgsdfhsdfh",
            "email": "[email protected]",
            "phone_number": " 1123123"
        },
       ...
 ],
 "marker":342523452
}

From that response I am iterating over each record and creating a dict to add to a list.

 def personA():
        return dict(
            person_id = record["id"],
            account = account,
            list = list   

r = requests.request('GET',f"{link}")
        resdata =r.json()
        for r in resdata:
            for record in resdata["records"]:
                listC = personA()
                listData.append(listC)         
    ) 

I am doing this for each list in the account, so some "person_id"'s show up many times, and some only once.

What would be the best way for me to create a list in the way that I'm going for?

CodePudding user response:

A dictionary will do here:

# Grab the account_id from the first element in data. I'm assuming that the account names are the same across all data points, but this is not a serious concern per the comment.
account_id = data[0]["account"]
# We maintain a dictionary that maps from a person_id to a list of strings appearing in the list field for a given person_id.
person_id_elements = {}

# Read each data point into our dictionary.
for data_point in data:
    person_id = data_point["person_id"]
    list_element = data_point["list"]
    if person_id not in person_id_elements:
        person_id_elements[person_id] = []
    
    person_id_elements[person_id].append(list_element)

# Transform id_data into objects that observe the desired schema.
result = []
for person_id in person_id_elements:
    result.append({
        "person_id": person_id,
        "account": account_id,
        "list": sorted(person_id_elements[person_id]) # Sorted, as shown in the sample output.
    })

print(result)

CodePudding user response:

You can create a mapping of the persons to the lists they appear in:

from collections import defaultdict

records = defaultdict(list)
for record in resdata["records"]:
    key = record["person_id"], record["account"]
    value = record["list"]
    records[key].append(value)

results = [
    {
        "person_id": person_id,
        "account": account,
        "list": lists
    } for (person_id, account), lists in records.items()
]
  • Related