Main problem:
I need to orchestrate the run of Python scripts using an Azure Data Factory pipeline.
What I have tried:
- Databricks: The problem with this solution is that it is costly, a little slow (the need to spin up clusters), and it requires that I write my code in a notebook.
- Batch activity from ADF: It too is costly and little slow. I don't have to write my code in a notebook, but I have to manually put it in a storage account, which is not great when debugging or updating.
My question:
Is there a way to run code in an Azure repo (or Github repo) directly from the Data Factory? Like the batch activity but instead of reading the code from a storage account read it from the repo itself?
Thanks for your help
CodePudding user response:
Based on the statement in the document "Pipelines and activities in Azure Data Factory", the Azure Git Repos and GitHub Repos are not the supported source data store and sink data store for ADF pipelines. So, it is not possible to directly run the code from the git repository in the ADF pipelines.
However, ADF has the Source control option to allow you to configure a Git repository with either Azure Repos or GitHub. Then you can configure CI/CD pipelines on Azure DevOps to integrate with ADF. The CI/CD pipelines can directly run code from the git repository.
For more details, you can see the document "CI/CD in ADF".