I hope this is ok to ask here. I have been looking through so many sites but still unable to come up with a decision.
Here's the scenario. I have a legacy application that has it's data in Sql Server database(s). A new application has now been created that also will be storing data in a Sql Server database. I need to now migrate the data from the legacy to the new application. The legacy database(s) structures have been modified in the new application to follow best practices and make it more efficient (eg: use of PK, FK, indexes, lookups, better table structures etc). So there will be a lot of transformation (lookups, data cleaning, merging/splitting data etc) happening from source to destination. Initially we will be doing only 5 years worth of data, but at a later point we may need to move the rest of the data across.
The company uses Azure for storage and there is no on-prem resources.
Given this situation, what would be the best option for Data Migration? SSIS or ADF? What are the advantages of one over the other (other than the fact that ADF is Azure cloud based and MS are probably moving to ADF more in the future). We will also need Dev/Test/Prod environments if that matters.
CodePudding user response:
Considering the company doesn't have on-prem resources, I would be looking at implementing the data migration on Azure Data Factory. Below are few points to take into consideration:
Pros:
- Integration between ADF and other Azure resources e.g. SQL Database is seamless and doesn't require connector setup, etc.
- You would take advantage of Microsoft's network to improve your data transfer, your data won't go over the network, everything is within MS data centers.
- More secure and reliable transfer, you could take advantage of ADF Managed Identity for authenticating to your source and destination.
- Since there will be a lot of changes, splits, etc, you can take advantage of ADF's ability to start from where the pipelines failed. On the other hand, in SSIS you'd need to start all over again.
- Better monitoring capabilities
Cons:
- You'd need an infrastructure to develop, deploy, and run your SSIS packages, which will increase implementation time and maintenance overhead.
- You could run SSIS packages using ADF, but it requires a much bigger implementation to host your packages and run them. Also, it'd be more costly.
- If the plan is to use a VM, there is an additional overhead to set up the VM and SSIS. Also, the cost associated with spinning up a new VM and SQL Server.
- Not great monitoring and retry capabilities
CodePudding user response:
"there is no on-prem resources"
This will make using SSIS rather difficult, as you will either need a SQL Server instance somewhere (on-prem or in an Azure VM) to run the SSIS packages, or execute them via ADF, which is really quite expensive. If you spun up an instance just for SSIS, you will also be copying all the data onto and then back off this instance for SSIS to process it. Everton Barciela outlines these issues in more detail within their answer.
I would personally write SQL scripts to transform the data, wrap them within Stored Procedures in a staging area on one of your existing SQL instances. You can then call those from ADF and copy the output into the new tables on the new SQL instance.
In doing this, you are leveraging the significant data transformation power of SQL to do the heavy lifting (of which you likely already have expertise within your org) and leaving ADF to just copy the output over to the new destination. This will mean your ADF pipelines are simple, easy to manage and therefore cost effective.
CodePudding user response:
Why not consider using Azure Database Migration Service. Azure Database Migration Service is a tool that helps you to simplify, guide and automate your database migration to Azure. Easily migrate your data, schema and objects from multiple sources to the cloud at scale.