I am new to Azure. I am currently following a tutorial on Azure Synapse Analytics. So far I am learning about how the data is stored. My understanding so far is that the container is contained within a workspace and the workspace is contained within an Azure Data Lake Gen2 account. The container itself contains a blob storage. I may be wrong but this is what I have understood so far.
Then I went to 'Browse Gallery' and added the 'Bing Covid-19 Data'. I noticed that it created an Azure Blob Storage and that the data lies within that.
If both Azure Data Lake Storage Gen2 and Azure Blob Storage account contains blob storages then why are the way the blobs stored different to one another?. If the container in the Gen2 account does not contain a blob storage then what does it contain?.
Any help would be greatly appreciated.
CodePudding user response:
Azure Data Lake Storage Gen2 (ADLS) is an extended form of Blob Storage with the addition of an hierarchical structure - so all ADLS is Blob Storage, but not all Blob Storage is ADLS.
There are numerous benefits to ADLS: better performance, larger size limits, additional security control, and persistent folders. ADLS is highly recommended for parallel systems (like Spark and Serverless SQL). For these reasons, Synapse workspaces require an ADLS account to serve as their root. The workspace will use this root to store metadata and some physical data (such as in the case of a Lake Database). Synapse can connect to multiple ADLS accounts and also (as you have seen) regular Blob Storage accounts.
Just to be accurate:
- The "container" is not "contained within a the workspace", it is defined by the ADLS account.
- The workspace is not "contained within ADLS", rather the ADLS account is attached to the workspace.
- The workspace is dependent on the ADLS account, but the ADLS account is independent of the workspace. This means you can still deal with it as you would any other ADLS account from non-workspace assets.
- Containers are an inherent part of Blob Storage (and subsequently ADLS). All blobs are stored inside containers.
I have not used the Gallery to import data, so I'm not sure why it would create the data in a Blob Storage account. Regardless, you can either connect to it directly as in your example or move the data into your ADLS account.
CodePudding user response:
The sample dataset you have used from the browse gallery has used an azure blob storage connector for its sink that's why you see the data is stored in Azure Blob storage for sample dataset. Whereas when it comes to Synapse workspace the storage layer by default in synapse uses ADLS gen2 account as a storage option with RA-GRS redundancy and hierarchical namespace enabled.
Now, to point to your concern if both stores blob why do you see two kinds of storage account in your workspace. So, in Azure we have various storage options supported. Azure blob storage can store objects in the form of blobs whereas ADLS gen2 is majorly designed for supporting big data and analytics workloads.
ADLS gen2 is a new generation storage solution which comes with benefits of improved performance and integration with new technologies.
When you will create datasets and linked services for your data sources you can see the different types of supported connectors for various data sources.