I have a databricks notebook running every 5 mins, part of the functionality is to connect to a file in Azure Data Lake Storage Gen2 (ADLS Gen2).
I get the following error in the code, but it seems to have "come out of nowhere" as the process was previously working fine. the "file = " part is written by me, all the parameters are as expected and matching the correct file names/containers and do exist in the data lake.
---> 92 file = DataLakeFileClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=" storage_account_name ";AccountKey=" storage_account_access_key,
93 file_system_name=azure_container, file_path=location_to_write)
94
/databricks/python/lib/python3.8/site-packages/azure/storage/filedatalake/_data_lake_file_client.py in from_connection_string(cls, conn_str, file_system_name, file_path, credential, **kwargs)
116 :rtype ~azure.storage.filedatalake.DataLakeFileClient
117 """
--> 118 account_url, _, credential = parse_connection_str(conn_str, credential, 'dfs')
119 return cls(
120 account_url, file_system_name=file_system_name, file_path=file_path,
/databricks/python/lib/python3.8/site-packages/azure/storage/filedatalake/_shared/base_client.py in parse_connection_str(conn_str, credential, service)
402 if service == "dfs":
403 primary = primary.replace(".blob.", ".dfs.")
--> 404 secondary = secondary.replace(".blob.", ".dfs.")
405 return primary, secondary, credential
Any thoughts/help? The actual error is in the base_client.py code, but I don't even know what "secondary" is supposed to be and why there would be an error there.
CodePudding user response:
For some reason, after restarting the cluster, something changed and the following "endpoint suffix" was required for this to continue working, couldn't find any docs on why it would work without this, but until a few days ago, it had always worked:
"DefaultEndpointsProtocol=https;AccountName=" storage_account_name ";AccountKey=" storage_account_access_key ";EndpointSuffix=core.windows.net"