Considerable portion of my Azure databrick's cost is being spent towards the storage account.
As per part "Azure databricks - cost optimization" steps, thought of storing data(i.e., delta tables, views, function etc) in mounted object storage (like Blob Storage) rather than in the DBFS root. So that I can use cold/archieve tiers of Blob Storage to reduce cost.
Is this approach of Blob Storage rather than in the DBFS root is valid one?
will it really save some cost spend towards the Azure Databrick storage?
Doing so, will have any performance issue?
CodePudding user response:
DBFS Root in general is not recommended for storing the production data for multiple reasons, like, no access outside, no control of the data lifecycle, etc., so you should use a separate storage accounts for that data.
But really, does most of your costs coming from the storage? Usually it's much smaller compared to the compute costs, etc.
Databricks field team recently released a very good blog post about cost management and optimization.