I need to extract the string variable with latest timestamp from a list.
The variables are in below format: |Name| |:---| |First_Record2022-10-11_NameofRecord.txt| |Second_Record_20221017.txt|
for now, i am fetching this in a list and iterating in a for loop to get the latest date from the two records using below line of code:
```python
for index,rows in df.iterrows:
datestr=rows['name'].replace('-','')
datestr=re.search(r'\d{8}|\d{6}',datestr).group()
date=DT.datetime.strptime(datestr,'%Y%m%d')
print('{:23}-->{}'.format(rows['name'],date))```
But this is only giving me date back. How do i compare the two strings and find out the string with latest date as in while comparing these two variables - "First_Record2022-10-11_NameofRecord.txt" and "Second_Record_20221017.txt ", i should be able to get "Second_Record_20221017.txt " as result.
CodePudding user response:
IIUC , is that what you're looking for?
df['date']= df['Name'].str.extract(r'(\d{4}.*?(?=[_|\.]))').replace(r'-','',regex=True)
df.sort_values('date').tail(1)['Name'].squeeze()
'Second_Record_20221017.txt'