How can I remove a nested data If a specific key found similar in big Json data-CodePudding

You probably downvote this post because the simple way to filter is to loop them all, but trust me as I have very massive data looping is very time consuming and maybe not be very efficient way,

 [
  {
    "from_name": "Haio",
    "from_id": 183556205,
    "receiver_name": "Shubh M",
    "targeted_id": 78545445,
    "gift_value": '$56'
  },
  {
    "from_name": "Mr. A",
    "from_id": 54545455,
    "receiver_name": "haio",
    "targeted_id": 78545445,
    "gift_value": '$7'
  }]

What do I want to accomplish?

I just want to delete the dict If targeted_idis same

CodePudding user response：

Provided you can load the whole dataset into memory, use pandas and drop_duplicates.

import pandas as pd
data =[
  {
    "from_name": "Haio",
    "from_id": 183556205,
    "receiver_name": "Shubh M",
    "targeted_id": 78545445,
    "gift_value": '$56'
  },
  {
    "from_name": "Mr. A",
    "from_id": 54545455,
    "receiver_name": "haio",
    "targeted_id": 78545445,
    "gift_value": '$7'
  }]
df = pd.DataFrame(data).drop_duplicates(subset=['targeted_id'])
print(df.to_json())

CodePudding user response：

Is that what you want?

source_data = [{
    "from_name": "Haio",
    "from_id": 183556205,
    "receiver_name": "Shubh M",
    "targeted_id": 78545445,
    "gift_value": '$56'
},
{
    "from_name": "Mr. A",
    "from_id": 54545455,
    "receiver_name": "haio",
    "targeted_id": 78545445,
    "gift_value": '$7'
}]

data_by_targeted_id = {}

for entry in source_data:
    if entry["targeted_id"] not in data_by_targeted_id:
        data_by_targeted_id[entry["targeted_id"]] = entry

result = list(data_by_targeted_id.values())
print(result)

Returns:

[{'from_name': 'Haio', 'from_id': 183556205, 'receiver_name': 'Shubh M', 'targeted_id': 78545445, 'gift_value': '$56'}]

It saves only first entry for each targeted_id.

CodePudding user response：

def remove_duplicates(lst, key=lambda x: x, acc=[], keys = []):
    if lst == []:
        return acc
    elif key(lst[0]) in keys:
        return remove_duplicates(lst[1:], key=key, acc=acc, keys=keys)
    else:
        return remove_duplicates(lst[1:], key=key, acc = acc   [lst[0]], keys=keys   [key(lst[0])])
```