There is a blob storage account, the name of the blob storage account is azureblob11
it has a container with the name source
.
The container source
contains 2 level subfolders.
The folder structure looks similar:
source (container)
|--a (folder)
|--ana (sub-folder)
|--hem (sub-folder)
|--thg (sub-folder)
|--oud (sub-folder)
The goal that I am trying to achieve is that copy all the subfolders (ana, hem, thg, oud
) along with their data to the container level and create these 4 subfolders as new containers along with the data in them as is.
The expected goal looks like this:
source (container)
|--a (folder)
|--ana (sub-folder)
|--hem (sub-folder)
|--thg (sub-folder)
|--oud (sub-folder)
ana (container)
hem (container)
thg (container)
oud (container)
To achieve this goal, I have used the copy data
pipeline in ADF.
Selected the source.
Chose the source data set
-> select New
-> chosen the type as Azure blob storage
-> chosen the format as binary
-> chosen the linkeservice as movecontainer
Selected the sink
-> select New
-> chosen the type as Azure blob storage
-> chosen the format as binary
I am stuck on how to configure the parameters or the dynamic content for creating the containers.
CodePudding user response:
Use Get Metadata
activity to get the list of subfolder names and pass it to ForEach
activity to copy the folders to sink as shown below.
Source structure in Azure data lake:
ADF pipeline:
Using the
Get Metadata
activity, get the list of subfolder names under the folder “a” and container “source”.- Create a dataset for the source path and add it to the Get Metadata dataset.
• Select child items under the field list in dataset properties.
• Output of Get Metadata:
- Pass this output to
ForEach
activity.
• Under items property, add Get Metadata output child items.
- Add
Copy data
activity insideForEach
activity.
• Create source dataset and parameterize the subfolder (directory) path in the source dataset as shown below.
• In copy data
activity, pass current ForEach item name to the dataset parameter value in source properties.
• Create sink dataset and parameterize the sink container path in the sink dataset.
• In the Copy activity
sink dataset, pass the current ForEach item name to the sink parameter.
- Subfolders and files from each subfolder are copied to the sink. It creates a sink container with the current item name if does not exist.