Lets say that in a directory i have multiple files like this:
Test1_2021-05-17 1139.xlsx
Test1_2021-04-17 1139.xlsx
Test1_2021-03-17 1139.xlsx
Test1_2021-02-17 1139.xlsx
Test1_2021-01-17 1139.xlsx
Test2_2021-05-17 1139.xlsx
Test2_2021-04-17 1139.xlsx
Test2_2021-03-17 1139.xlsx
Test2_2021-02-17 1139.xlsx
How can I find the file which contains the latest timestamp and then i want to open it as a data frame.
So, eg. o want to get the file name: Test1_2021-05-17 1139.xlsx. How can i do that with python?
I tried this one but it is not getting me the file with the latest timestamp on its name:
import glob
import os
list_of_files = glob.glob('/path/*')
latest_file = max(list_of_files, key=os.path.getctime)
print(latest_file)
CodePudding user response:
Maybe you have to filter your filenames before:
import pathlib
import os.path
import pandas as pd
filename = max([f for f in pathlib.Path('/path').glob('Test_*.xlsx')],
key=os.path.getctime)
df = pd.DataFrame(filename)
CodePudding user response:
If you really need to make it based on the filenames, you can pass a lamdba
function to max()
, to modifies the items properly:
fNames = '''.../0010/Test1_2021-05-17 1139.xlsx
.../1212/Test1_2021-04-17 1139.xlsx
.../1212/Test1_2021-03-17 1139.xlsx
.../1444/Test1_2021-02-17 1139.xlsx
.../1212/Test1_2021-01-17 1139.xlsx
.../19/Test2_2021-05-17 1139.xlsx
.../1212/Test2_2021-04-17 1139.xlsx
.../1212/Test2_2021-03-17 1139.xlsx
.../1212/Test2_2021-02-17 1139.xlsx'''.splitlines()
# use only files containing 'Test_1':
fNames = [f for f in fNames if 'test1_' in f.lower()]
# rsplit removes the directory names.
max_fName = max(
fNames, key=lambda p: p.rsplit('/', 1)[1].split('_', 1)[1].split(' ', 1)[0]
)
print(max_fName)
#or hard coded:
max_fName = max(fNames, key=lambda p: p.rsplit('/', 1)[1][6:16])
print(max_fName)
Out:
.../0010/Test1_2021-05-17 1139.xlsx
.../0010/Test1_2021-05-17 1139.xlsx