I'm trying to run the following bash command on python and save that output into a variable. I'm new to using bash so any help will be appreciated.
Here's my usecase I have data stored in an S3 bucket (let's say the path is s3://test-bucket/folder1/subd1/datafiles/)
in the datafiles folder there are multiple data files:
a1_03_27_2020_N.csv
a1_04_05_2021_O.csv
a1_07_16_2021_N.csv
I'm trying to select the latest file (in this case a1_07_16_2021_N) and then read that data file using pandas
Here's what I have so far
The command to select the latest file
ls -t a1*|head -1
but then I'm not sure how to 1- run that command on python 2- how to save that the output as a variable (I know this is not correct but something like
latest_file = os.environ['ls -t a1*|head -1'])
Then read the file:
df = pd.read_csv(latest_file)
Thank you in advance again!
CodePudding user response:
Python replaces most shell functionality. You can do the search and filtering in python itself. No need for a callout.
from pathlib import Path
dir_to_search = Path("test-bucket/folder1/subd1/datafiles/")
try:
latest = max(dir_to_search.glob("a1*.csv"), key=lambda path: path.stat().st_mtime)
print(latest)
except ValueError:
print("no csv here")
But if you want to run the shell, several functions in subprocess
will do it. For instance,
import subprocess as subp
result = subp.run("ls -t test-bucket/folder1/subd1/datafiles/a1* | head -1",
shell=True,
capture_output=True, text=True).stdout.strip()