Home > OS >  Azure data factory with a copy activity using a binary dataset fails to copy folder contents if para
Azure data factory with a copy activity using a binary dataset fails to copy folder contents if para

Time:01-10

In my Azure data factory I need to copy data from an SFTP source that has structured the data into date based directories with the following hierarchy year -> month -> date -> file

I have created a linked service and a binary dataset where the dataset "filesystem" points to the host and "Directory" points to the folder that contains the year directories. Ex: host/exampledir/yeardir/

with yeardir containing the year directories.

When I manually write into the dataset that I want the folder "2015" it will copy the entirety of the 2015 folder, however if I put a parameter for the directory and then input the same folder path from a copy activity it creates a file called "2015" inside of my blob storage that contains no data.

My current workaround is to make a nested sequence of get metadata for loops that drill into each folder and subfolder and copy the individual file ends. However the desired result is to instead have the single binary dataset copy each folder without the need for get metadata.

Is this possible within the scope of the data factory?

edit:

manual filepath that works

parameterized filepath

properties used in copy activity

To add further context I have tried manually writing the filepath into the copy activity as shown in the photo, I have also attempted to use variables, dynamic content for the parameter (using base filepath and concat) and also putting the base filepath into the dataset alongside @dataset().filePath. None of these solutions have worked for me so far and either copy nothing or create the empty file I mentioned earlier.

The sink is a binary dataset linked to Azure Data Lake Storage Gen2.

sink filepath

CodePudding user response:

Since giving exampledir/yeardir/2015 worked perfectly for you and you want to copy all the folders present in exampledir/yeardir, you can follow the below procedure:

  • I have taken a get metadata activity to get the child items of the folder exampledir/yeardir/ (In my demonstration, I have taken path as 'maindir/yeardir'.).

enter image description here

  • This will give you all the year folders present. I have taken only 2020 and 2021 as an example.

enter image description here

  • Now, with only one for each activity with items value as the child items output of get metadata activity, I have directly used copy activity.
@activity('Get Metadata1').output.childItems

enter image description here

  • Now, inside for each I have my copy data activity. For both source and sink, I have created a dataset parameter for paths. I have given the following dynamic content for source path.
maindir/yeardir/@{item().name}

enter image description here

  • For sink, I have given the output directory as follows:
outputDir/@{item().name}

enter image description here

  • Since giving path manually as exampledir/yeardir/2015 worked, we have got the list of year folders using get metadata activity. We looped through each of this and copy each folder with source path as exampledir/yeardir/<current_iteration_year_folder>.

  • Based on how I have given my sink path, the data will be copied with contents. The following is a reference image.

enter image description here

  • Related