You probably downvote this post because the simple way to filter is to loop them all, but trust me as I have very massive data looping is very time consuming and maybe not be very efficient way,
[
{
"from_name": "Haio",
"from_id": 183556205,
"receiver_name": "Shubh M",
"targeted_id": 78545445,
"gift_value": '$56'
},
{
"from_name": "Mr. A",
"from_id": 54545455,
"receiver_name": "haio",
"targeted_id": 78545445,
"gift_value": '$7'
}]
What do I want to accomplish?
I just want to delete the dict If targeted_id
is same
CodePudding user response:
Provided you can load the whole dataset into memory, use pandas and drop_duplicates
.
import pandas as pd
data =[
{
"from_name": "Haio",
"from_id": 183556205,
"receiver_name": "Shubh M",
"targeted_id": 78545445,
"gift_value": '$56'
},
{
"from_name": "Mr. A",
"from_id": 54545455,
"receiver_name": "haio",
"targeted_id": 78545445,
"gift_value": '$7'
}]
df = pd.DataFrame(data).drop_duplicates(subset=['targeted_id'])
print(df.to_json())
CodePudding user response:
Is that what you want?
source_data = [{
"from_name": "Haio",
"from_id": 183556205,
"receiver_name": "Shubh M",
"targeted_id": 78545445,
"gift_value": '$56'
},
{
"from_name": "Mr. A",
"from_id": 54545455,
"receiver_name": "haio",
"targeted_id": 78545445,
"gift_value": '$7'
}]
data_by_targeted_id = {}
for entry in source_data:
if entry["targeted_id"] not in data_by_targeted_id:
data_by_targeted_id[entry["targeted_id"]] = entry
result = list(data_by_targeted_id.values())
print(result)
Returns:
[{'from_name': 'Haio', 'from_id': 183556205, 'receiver_name': 'Shubh M', 'targeted_id': 78545445, 'gift_value': '$56'}]
It saves only first entry for each targeted_id
.
CodePudding user response:
def remove_duplicates(lst, key=lambda x: x, acc=[], keys = []):
if lst == []:
return acc
elif key(lst[0]) in keys:
return remove_duplicates(lst[1:], key=key, acc=acc, keys=keys)
else:
return remove_duplicates(lst[1:], key=key, acc = acc [lst[0]], keys=keys [key(lst[0])])
```