Home > Software engineering >  Update the source Dataset of Azure Data factory built-in copy task
Update the source Dataset of Azure Data factory built-in copy task

Time:11-29

I am new to Azure Data Lakes. Currently I am using ADLS Gen2. I have configured pipelines for single source data sets to destination data sets. This is working fine. Then I configured the pipeline for uploading / ingesting whole database from on-prem to ADLS gen2.

Here is the process: Factory studio -> Ingest -> Built-in copy task -> sql server as Data store and configured runtime -> Select All tables (Existing tables) -> and so on.

Then from pipelines when I trigger this activity, it successfully uploads all tables to containers in separate files(based on source table names)

Then when I update data in one of the tables in sources it successfully updates the data in destination file.

But the problem is when I add new table in database and then triggers the activity, this new table is not uploaded. Is there a way to update source data set to include this new table?

I have seen all the properties of source data set and activity in pipeline. Also searched for the solution, but stuck in this scenario.

CodePudding user response:

To dynamically get list of all tables and copy it to your datalake storage account, you can use the following procedure:

I have used a script activity on my azure SQL database (for demonstration) to get the list of tables in my database using the following query as suggested by @Scott Mildenberger:

SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE'

enter image description here

  • Now in for each loop, Use items value as the output rows of above script activity i.e., @activity('get tables').output.resultSets[0].rows. Inside for each loop, use a copy activity, where source is your database and destination is ADLS sink.

  • I have created 2 parameters for sink destination called schema and table and used them as shown below:

enter image description here

  • In the source settings of copy activity, I have given the values for these dataset parameters as following:
schema: @item().TABLE_SCHEMA
table: @item().TABLE_NAME

enter image description here

  • For ADLS sink, I have created 2 parameters called schema and table and I have used it to create file name dynamically as @{dataset().schema}@{dataset().table}.csv.

enter image description here

  • I have given its values in sink settings of copy data same as above.

enter image description here

  • When I run the pipeline, it will give desired results. The following is a reference image. Even when you add a new table, the query will pick this up and copy it as a file to your ADLS container.

enter image description here

  • Related