Home > Blockchain >  Resource Not Found Error: When Reading CSV from Azure Blob using Pandas with its SAS URL
Resource Not Found Error: When Reading CSV from Azure Blob using Pandas with its SAS URL

Time:10-24

I am trying to perform Dataset versioning where I read a CSV file into a pandas DataFrame and then create a new version of an Azure ML Dataset. I am running the below code in an Azure CLI job within Azure DevOps.

df = pd.read_csv(blob_sas_url)

At this line, I get a 404 Error. Error Message:

urllib.error.HTTPError: HTTP Error 404: The specified resource does not exist

I tried to do this locally, I was able to read the csv file into Dataframe. The SAS URL and token are not expired too.

How to solve this issue?

Edit - Code

def __init__(self, args):
    self.args = args
    self.run = Run.get_context()
    self.workspace = self.run.experiment.workspace

def get_Dataframe(self):

    print(self.args.blob_sas_url)
    df = pd.read_csv(self.args.blob_sas_url)

    return df


def create_pipeline(self):
    print("Creating Pipeline")
    print(self.args.blob_sas_url)

    dataframe = self.dataset_to_update()
    # Rest of Code

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Azure ML Dataset Versioning pipeline')

    parser.add_argument('--blob_sas_url', type=str, help='SAS URL to the Data File in Blob Storage')
    
    args = parser.parse_args()
    ds_versioner = Pipeline(args)
    ds_versioner.create_pipeline()

In both the instances where I print the SAS URL within the script print(self.args.blob_sas_url), the URL is shortened. I was able to see this in the std_log.txt file.

CodePudding user response:

The reason of shortening or technically trimming your input argument is that the bash variable is split at the & level. so all the rest of your sas url goes as "commands" or other "arguments". Apparently that is how azure parses it.

eg:

python3 test_input.py --blob_sas_url "somepath/to/storage/account/file.txt?sv=2022-01-01&sr=b&sig=SOmethingwd21dd1"
>>> output:  somepath/to/storage/account/file.txt?sv=2022-01-01&sr=b&sig=SOmethingwd21dd1

python3 test_input.py --blob_sas_url somepath/to/storage/account/file.txt?sv=2022-01-01&sr=b&sig=SOmethingwd21dd1
>>> output:  
[1] 1961
[2] 1962
[2]   Done                    sr=b

so you just need to quote your Azure variable in your step command like follows:

python3 your_python_script.py --blob_sas_url "$(azml.sasURL)"

  • Related