Home > Back-end >  Azure Data Factory uploads data to database when pipeline run fails
Azure Data Factory uploads data to database when pipeline run fails

Time:01-26

I have a pipeline running in my ADF where I take fields from a json file and save them to multiple tables in my database. The pipeline starts running when a file is uploaded to a container in my storage account, and after the run the files are deleted. The problem at hand is that when the pipeline fails for whatever reason, the files don't get deleted (because it doesn't make it to the end of the run), and for some reason the SQL records are still saved to some tables which means my tables get spammed with data. Is there any way to only upload the data if the pipeline succeeded?

I didn't see any option that allows me to do that and have to manually take care of the problem when it happens

CodePudding user response:

Usually when we trigger one activity from another, we use the success route. You can also use a catch block in case of an activity failure to do some other activity. Refer to this documentation.

So incase the failures happen in a certain activity, use this method to execute another activity which will clear out your SQL records.

CodePudding user response:

Adding to @Anupam Chand's answer,

There is no direct way to roll-back or delete records from SQL table which got inserted in the current pipeline run. You should have some flag or watermark column field in the SQL table to identify the records which got inserted in current run. During failure of the pipeline activities, you can add the logic of deleting records in the failure path. I tried to repro this. Below is the approach.

  • Initially a variable of string type is taken, and it is set using set variable activity to get the time at which the pipeline starts executing.

enter image description here

  • Then copy activity is added. While copying, add the time when the records get inserted in SQL table.

enter image description here

  • If this activity is failed, script activity is added to the failure path of copy activity.

enter image description here

  • Then Script activity is added to delete the records. Query is given as
Delete from tgt_table where Inserted_Datetime >='@{variables('initial_time')}'

By this way, we can roll-back the records which got partially loaded during pipeline failure.

  • Related