Home > Blockchain >  Running bash on python to read from S3 Bucket and saving output
Running bash on python to read from S3 Bucket and saving output

Time:10-14

I'm trying to run the following bash command on python and save that output into a variable. I'm new to using bash so any help will be appreciated.

Here's my usecase I have data stored in an S3 bucket (let's say the path is s3://test-bucket/folder1/subd1/datafiles/)

in the datafiles folder there are multiple data files:

a1_03_27_2020_N.csv
a1_04_05_2021_O.csv
a1_07_16_2021_N.csv

I'm trying to select the latest file (in this case a1_07_16_2021_N) and then read that data file using pandas

Here's what I have so far

The command to select the latest file

ls -t a1*|head -1

but then I'm not sure how to 1- run that command on python 2- how to save that the output as a variable (I know this is not correct but something like

latest_file = os.environ['ls -t a1*|head -1'])

Then read the file:

df = pd.read_csv(latest_file)

Thank you in advance again!

CodePudding user response:

Python replaces most shell functionality. You can do the search and filtering in python itself. No need for a callout.

from pathlib import Path

dir_to_search = Path("test-bucket/folder1/subd1/datafiles/")
try:
    latest = max(dir_to_search.glob("a1*.csv"), key=lambda path: path.stat().st_mtime)
    print(latest)
except ValueError:
    print("no csv here")

But if you want to run the shell, several functions in subprocess will do it. For instance,

import subprocess as subp

result = subp.run("ls -t test-bucket/folder1/subd1/datafiles/a1* | head -1",
    shell=True, 
        capture_output=True, text=True).stdout.strip()
  • Related