I have list of dictionary below
rec=[{
'Name': 'aRe',
'Email': '[email protected]',
'timestamp': '2021-11-29T04:33:28.138522Z'
},
{
'Name': 'Umar',
'Email': '[email protected]',
'timestamp': '2021-11-28T04:33:28.138522Z'
},
{
'Name': 'Are',
'Email': '[email protected]',
'timestamp': '2021-11-27T04:33:28.138522Z'
},
{
'Name': 'arE',
'Email': '[email protected]',
'timestamp': '2021-11-28T06:59:58.975864Z'
},
{
'Name': 'umaR',
'Email': '[email protected]',
'timestamp': '2021-11-29T04:33:28.138522Z'
},
{
'Name': 'Sc',
'Email': '[email protected]',
'timestamp': '2022-02-01T15:02:12.301701Z'
}
]
- if duplicate id is present then extract the dict with latest timestamp
Expected out
[{'Name': 'umaR',
'Email': '[email protected]',
'timestamp': '2021-11-29T04:33:28.138522Z'},
{'Name': 'aRe',
'Email': '[email protected]',
'timestamp': '2021-11-29T04:33:28.138522Z'},
{'Name': 'Sc',
'Email': '[email protected]',
'timestamp': '2022-02-01T15:02:12.301701Z'}]
Code is below
from itertools import groupby
filtered_recs = []
for key, group_iter in groupby(recs, lambda rec: rec['Name'].lower()):
recent_rec = max(group_iter, key = lambda rec: rec['timestamp'])
filtered_recs.append(recent_rec)
filtered_recs
My code is working fine if all the 'Name' in same case. Like name
are like, 'are', 'umar', 'sc' not for irregular case letters
CodePudding user response:
Sort first the recs
:
from itertools import groupby
filtered_recs = []
recs = sorted(recs, key=lambda rec: rec["Name"].lower()) # <-- sort before groupby
for key, group_iter in groupby(recs, lambda rec: rec["Name"].lower()):
recent_rec = max(group_iter, key=lambda rec: rec["timestamp"])
filtered_recs.append(recent_rec)
print(filtered_recs)
Prints:
[
{
"Name": "aRe",
"Email": "[email protected]",
"timestamp": "2021-11-29T04:33:28.138522Z",
},
{
"Name": "Sc",
"Email": "[email protected]",
"timestamp": "2022-02-01T15:02:12.301701Z",
},
{
"Name": "umaR",
"Email": "[email protected]",
"timestamp": "2021-11-29T04:33:28.138522Z",
},
]
EDIT: Version without sort:
filtered_recs = {}
for r in recs:
filtered_recs.setdefault(r["Name"].lower(), []).append(r)
for k, v in filtered_recs.items():
filtered_recs[k] = max(v, key=lambda rec: rec["timestamp"])
print(list(filtered_recs.values()))
Prints:
[
{
"Name": "aRe",
"Email": "[email protected]",
"timestamp": "2021-11-29T04:33:28.138522Z",
},
{
"Name": "umaR",
"Email": "[email protected]",
"timestamp": "2021-11-29T04:33:28.138522Z",
},
{
"Name": "Sc",
"Email": "[email protected]",
"timestamp": "2022-02-01T15:02:12.301701Z",
},
]