Home > Back-end >  Databricks DataLakeFileClient Returns Error
Databricks DataLakeFileClient Returns Error

Time:05-17

I have a databricks notebook running every 5 mins, part of the functionality is to connect to a file in Azure Data Lake Storage Gen2 (ADLS Gen2).

I get the following error in the code, but it seems to have "come out of nowhere" as the process was previously working fine. the "file = " part is written by me, all the parameters are as expected and matching the correct file names/containers and do exist in the data lake.

---> 92     file = DataLakeFileClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=" storage_account_name ";AccountKey="   storage_account_access_key, 
     93                                                    file_system_name=azure_container, file_path=location_to_write)
     94 

/databricks/python/lib/python3.8/site-packages/azure/storage/filedatalake/_data_lake_file_client.py in from_connection_string(cls, conn_str, file_system_name, file_path, credential, **kwargs)
    116         :rtype ~azure.storage.filedatalake.DataLakeFileClient
    117         """
--> 118         account_url, _, credential = parse_connection_str(conn_str, credential, 'dfs')
    119         return cls(
    120             account_url, file_system_name=file_system_name, file_path=file_path,

/databricks/python/lib/python3.8/site-packages/azure/storage/filedatalake/_shared/base_client.py in parse_connection_str(conn_str, credential, service)
    402     if service == "dfs":
    403         primary = primary.replace(".blob.", ".dfs.")
--> 404         secondary = secondary.replace(".blob.", ".dfs.")
    405     return primary, secondary, credential

Any thoughts/help? The actual error is in the base_client.py code, but I don't even know what "secondary" is supposed to be and why there would be an error there.

CodePudding user response:

For some reason, after restarting the cluster, something changed and the following "endpoint suffix" was required for this to continue working, couldn't find any docs on why it would work without this, but until a few days ago, it had always worked:

"DefaultEndpointsProtocol=https;AccountName=" storage_account_name ";AccountKey=" storage_account_access_key ";EndpointSuffix=core.windows.net"
  • Related