I have no experience with Azure Synapse but my understanding is that is the same as Databricks, ADF, ADLS2 and Hive in SQL DWH, all together in one workspace with a different name.
Am I wrong?
CodePudding user response:
Yes, in many context Azure Synapse and Databricks provide the same Big Data Analytics approach but there are also few differences between these services.
With the new functionalities in Synapse now, we see some similar functionalities as in Databricks (e.g. Spark, Delta) which raises the question on how Synapse compares to Databricks and when to use which.
Yes, both have Spark but…
Databricks
- has a proprietary data processing engine (Databricks Runtime) built on a highly optimized version of Apache Spark offering 50x performance
- already has support for Spark 3.0
- allows users to opt for GPU enabled clusters and choose between standard and high-concurrency cluster mode
Synapse
- Open-source Apache Spark (thus not including all features of Databricks Runtime)
- has built-in support for .NET for Spark applications
Yes, both have notebooks
Synapse
Nteract Notebooks
has co-authoring of Notebooks, but one person needs to save the Notebook before another person sees the change
doesn’t have automated versioning
Databricks
Databricks Notebooks
Has real-time co-authoring (both authors see the changes in real-time) Automated versioning
Yes, both can access data from a data lake
Synapse
- When creating Synapse, you can select a data lake which will be your primary data lake (can query it directly from the scripts and notebooks)
Databricks
- You need to mount a data lake before using it
Yes, both leverage Delta
Synapse
- Delta Lake is open source
Databricks
- Has Databricks Delta which is built on the open source but offers some extra optimizations
No, they are not the same
Synapse
Has both a traditional SQL engine (to fit the traditional BI developers) as well as a Spark engine (to fit data scientists, analysts & engineers)
Is a data warehouse (i.e. Synapse Analytics) an interface tool (i.e. Synapse Studio)
Databricks
- Is not a data warehouse tool but rather a Spark-based notebook tool Has a focus on Spark, Delta Engine, MLflow and MLR
No, they don’t offer the same developer experience
Synapse
Offers for Spark-development a developer experience currently only through Synapse Studio (not through local IDEs)
Doesn’t have Git yet integrated within the Synapse Studio Notebooks
Databricks
- Offers a developer experience within Databricks UI, Databricks Connect (i.e. remote connect from Visual Studio Code, Pycharm, etc.) and soon Jupyter & RStudio UI within Databricks