I have a parent folder in ADLS Gen2 called Source which has number of subfolders and these subfolders contain the actual data files as shown in in the below example...
***Source: ***
Folder Name: 20221212
A_20221212.txt B_20221212.txt C_20221212.txt
Folder Name: 20221219
A_20221219.txt B_20221219.txt C_20221219.txt
Folder Name: 20221226
A_20221226.txt B_20221226.txt C_20221226.txt
How can I copy files from subfolders to name specific folders (should create a new folder if it does not exist) using Azure Data Factory, please see the example below...
***Target: ***
Folder Name: A
A_20221212.txt A_20221219.txt A_20221226.txt
Folder Name: B
B_20221212.txt B_20221219.txt B_20221226.txt
Folder Name: C
C_20221212.txt C_20221219.txt C_20221226.txt
Really appreciate your and help.
CodePudding user response:
I have reproduced the above and got below results.
You can follow the below procedure using Get Meta data activity if you have the folder directories at same level.
This is my source folder structure.
data
20221212
A_20221212.txt
B_20221212.txt
C_20221212.txt`
20221219
A_20221219.txt
B_20221219.txt
C_20221219.txt
20221226
A_20221226.txt
B_20221226.txt
C_20221226.txt
Source dataset:
Give this to Get Meta data activity and use ChildItems
.
Then Give the ChildItems array from Get Meta data activity to a ForEach activity. Inside ForEach I have used set variable for storing folder name.
@split(item().name,'_')[0]
Now, use copy activity and in source use wild card path like below.
For sink create dataset parameters and give it copy activity sink like below.
My pipeline JSON:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "sourcetxt",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Set variable1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"wildcardFolderPath": "*",
"wildcardFileName": {
"value": "@item().name",
"type": "Expression"
},
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "sourcetxt",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "targettxts",
"type": "DatasetReference",
"parameters": {
"folder_name": {
"value": "@variables('folder_name')",
"type": "Expression"
},
"file_name": {
"value": "@item().name",
"type": "Expression"
}
}
}
]
},
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "folder_name",
"value": {
"value": "@split(item().name,'_')[0]",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"folder_name": {
"type": "String"
}
},
"annotations": []
}
}
Result: