Home > Enterprise >  How to extract the elements only latest timestamp from json irrespective of case sensitive
How to extract the elements only latest timestamp from json irrespective of case sensitive

Time:04-28

I have list of dictionary below

rec=[{
    'Name': 'aRe',
    'Email': '[email protected]',
    'timestamp': '2021-11-29T04:33:28.138522Z'
  },
  {
    'Name': 'Umar',
    'Email': '[email protected]',
    'timestamp': '2021-11-28T04:33:28.138522Z'
  },
  
  {
    'Name': 'Are',
    'Email': '[email protected]',
    'timestamp': '2021-11-27T04:33:28.138522Z'
  },
  
  {
    'Name': 'arE',
    'Email': '[email protected]',
    'timestamp': '2021-11-28T06:59:58.975864Z'
  },
    {
    'Name': 'umaR',
    'Email': '[email protected]',
    'timestamp': '2021-11-29T04:33:28.138522Z'
  },
  {
    'Name': 'Sc',
    'Email': '[email protected]',
    'timestamp': '2022-02-01T15:02:12.301701Z'
  }
]
  • if duplicate id is present then extract the dict with latest timestamp

Expected out

[{'Name': 'umaR',
  'Email': '[email protected]',
  'timestamp': '2021-11-29T04:33:28.138522Z'},
 {'Name': 'aRe',
  'Email': '[email protected]',
  'timestamp': '2021-11-29T04:33:28.138522Z'},
 {'Name': 'Sc',
  'Email': '[email protected]',
  'timestamp': '2022-02-01T15:02:12.301701Z'}]

Code is below

from itertools import groupby
filtered_recs = []
for key, group_iter in groupby(recs, lambda rec: rec['Name'].lower()):
    recent_rec = max(group_iter, key = lambda rec: rec['timestamp'])
    filtered_recs.append(recent_rec)
filtered_recs

My code is working fine if all the 'Name' in same case. Like name are like, 'are', 'umar', 'sc' not for irregular case letters

CodePudding user response:

Sort first the recs:

from itertools import groupby

filtered_recs = []

recs = sorted(recs, key=lambda rec: rec["Name"].lower())  # <-- sort before groupby

for key, group_iter in groupby(recs, lambda rec: rec["Name"].lower()):
    recent_rec = max(group_iter, key=lambda rec: rec["timestamp"])
    filtered_recs.append(recent_rec)

print(filtered_recs)

Prints:

[
    {
        "Name": "aRe",
        "Email": "[email protected]",
        "timestamp": "2021-11-29T04:33:28.138522Z",
    },
    {
        "Name": "Sc",
        "Email": "[email protected]",
        "timestamp": "2022-02-01T15:02:12.301701Z",
    },
    {
        "Name": "umaR",
        "Email": "[email protected]",
        "timestamp": "2021-11-29T04:33:28.138522Z",
    },
]

EDIT: Version without sort:

filtered_recs = {}
for r in recs:
    filtered_recs.setdefault(r["Name"].lower(), []).append(r)

for k, v in filtered_recs.items():
    filtered_recs[k] = max(v, key=lambda rec: rec["timestamp"])

print(list(filtered_recs.values()))

Prints:

[
    {
        "Name": "aRe",
        "Email": "[email protected]",
        "timestamp": "2021-11-29T04:33:28.138522Z",
    },
    {
        "Name": "umaR",
        "Email": "[email protected]",
        "timestamp": "2021-11-29T04:33:28.138522Z",
    },
    {
        "Name": "Sc",
        "Email": "[email protected]",
        "timestamp": "2022-02-01T15:02:12.301701Z",
    },
]
  • Related