I am new to Synapse and I have to make a pipeline that will delete files from folders in a hierarchy like the attached image. expecting hierarchy. The red half circles mark the files I would like to delete files for example older than 2 months.
As for now I have made a pipline for a single folder and using the for each loop I can get to the files and delete the corresponding one. And it works, since I have about 60-70 folders and even more files I wanted to go a level higher up and make a pipeline for each folder to execute. And with this is a problem. When i use GetMetadata Activity for top folder, and use for each loop to take name folders then i can not acess files in folder just only folder. Could you help me someone how to slove this?
deleting pipline for single folder using for each loop
CodePudding user response:
We can achieve this using nested for each
activities with the help of execute pipeline
activity. As mentioned, Get metadata
with wildcards returns all files without folders and Delete
activity is unable to recognize wildcard folder paths(Folder/*).
- I have created a similar folder structure for demo. In my pipeline, I have first created an array parameter
req_files
(sample1.csv and sample2.csv) with names of files required.
Note: If you want to dynamically do this, you can use append variable
to build required file names (file09/22 and file08/22).
- I used one
get metadata
to get folder names (which are inside root folder). I am iterating through the output of get metadata in myfor each
activity (items value is@activity('root folder contents').output.childItems
). - Inside my for each, I used another
get metadata
activity to loop through each of the sub folders (to get file contents). - Now I have the folder name and list of files inside it. I am going to use
execute pipeline
to implement nested for each. Create 3 parameters in a new pipeline calleddelete_pipeline
(where I perform delete) ascurrent_folder, folder_files and files_needed
. - Pass the following dynamic content for each of them from parent pipeline.
current_folder: @item().name
folder_files: @activity('sub folder contents').output.childItems
files_needed: @pipeline().parameters.req_files
- Now in
delete_pipeline
, I have a for each loop to loop through the list of files we are passing (items value is@pipeline().parameters.folder_files
). - Inside this for each, I am using an
If condition
activity. This is because I want to delete files which are not in myreq_files
parameter (array from parent pipeline which we passed tofiles_needed
parameter indelete_pipeline
). The condition forif condition
activity will be as following:
@contains(pipeline().parameters.files_needed,item().name)
We need to delete the file only when it is not present in
req_files (files_needed)
. So, when the condition is false, we perform delete.I have created 2 parameters
file_namepath_of_file_to_delete
andfile_name_to_delete
in the dataset I am using for delete activity with following dynamic content.
file_namepath_of_file_to_delete: Folder/@{pipeline().parameters.current_folder}
file_name_to_delete: @item().name
When I run the pipeline, it keeps the required files and deletes the rest. The following are output images for reference.
- Debug output: https://i.imgur.com/E6GNVHW.png
- My folder after I run the pipeline: https://i.imgur.com/bqN00Dw.png