Get the file name which contains the latest timestamp in a directory using python-CodePudding

Lets say that in a directory i have multiple files like this:

Test1_2021-05-17 1139.xlsx
Test1_2021-04-17 1139.xlsx
Test1_2021-03-17 1139.xlsx
Test1_2021-02-17 1139.xlsx
Test1_2021-01-17 1139.xlsx
Test2_2021-05-17 1139.xlsx
Test2_2021-04-17 1139.xlsx
Test2_2021-03-17 1139.xlsx
Test2_2021-02-17 1139.xlsx

How can I find the file which contains the latest timestamp and then i want to open it as a data frame.

So, eg. o want to get the file name: Test1_2021-05-17 1139.xlsx. How can i do that with python?

I tried this one but it is not getting me the file with the latest timestamp on its name:

import glob
import os

list_of_files = glob.glob('/path/*') 
latest_file = max(list_of_files, key=os.path.getctime)
print(latest_file)

CodePudding user response：

Maybe you have to filter your filenames before:

import pathlib
import os.path
import pandas as pd

filename = max([f for f in pathlib.Path('/path').glob('Test_*.xlsx')], 
               key=os.path.getctime)

df = pd.DataFrame(filename)

CodePudding user response：

If you really need to make it based on the filenames, you can pass a lamdba function to max(), to modifies the items properly:

fNames = '''.../0010/Test1_2021-05-17 1139.xlsx
.../1212/Test1_2021-04-17 1139.xlsx
.../1212/Test1_2021-03-17 1139.xlsx
.../1444/Test1_2021-02-17 1139.xlsx
.../1212/Test1_2021-01-17 1139.xlsx
.../19/Test2_2021-05-17 1139.xlsx
.../1212/Test2_2021-04-17 1139.xlsx
.../1212/Test2_2021-03-17 1139.xlsx
.../1212/Test2_2021-02-17 1139.xlsx'''.splitlines()

# use only files containing 'Test_1':
fNames = [f for f in fNames if 'test1_' in f.lower()]

# rsplit removes the directory names.
max_fName = max(
    fNames, key=lambda p: p.rsplit('/', 1)[1].split('_', 1)[1].split(' ', 1)[0]
)
print(max_fName)

#or hard coded:
max_fName = max(fNames, key=lambda p: p.rsplit('/', 1)[1][6:16])
print(max_fName)

Out:

.../0010/Test1_2021-05-17 1139.xlsx
.../0010/Test1_2021-05-17 1139.xlsx