Home > Enterprise >  Copy Files from a folder to multiple folders based on the file name in Azure Data Factory
Copy Files from a folder to multiple folders based on the file name in Azure Data Factory

Time:01-25

I have a parent folder in ADLS Gen2 called Source which has number of subfolders and these subfolders contain the actual data files as shown in in the below example...

***Source: ***

Folder Name: 20221212

A_20221212.txt B_20221212.txt C_20221212.txt

Folder Name: 20221219

A_20221219.txt B_20221219.txt C_20221219.txt

Folder Name: 20221226

A_20221226.txt B_20221226.txt C_20221226.txt

How can I copy files from subfolders to name specific folders (should create a new folder if it does not exist) using Azure Data Factory, please see the example below...

***Target: ***

Folder Name: A

A_20221212.txt A_20221219.txt A_20221226.txt

Folder Name: B

B_20221212.txt B_20221219.txt B_20221226.txt

Folder Name: C

C_20221212.txt C_20221219.txt C_20221226.txt

Really appreciate your and help.

CodePudding user response:

I have reproduced the above and got below results.

You can follow the below procedure using Get Meta data activity if you have the folder directories at same level.

This is my source folder structure.

data
    20221212
        A_20221212.txt
        B_20221212.txt
        C_20221212.txt`
    20221219
        A_20221219.txt
        B_20221219.txt
        C_20221219.txt
    20221226
        A_20221226.txt
        B_20221226.txt
        C_20221226.txt

Source dataset:

enter image description here

Give this to Get Meta data activity and use ChildItems.

Then Give the ChildItems array from Get Meta data activity to a ForEach activity. Inside ForEach I have used set variable for storing folder name.

@split(item().name,'_')[0]

enter image description here

Now, use copy activity and in source use wild card path like below.

enter image description here

For sink create dataset parameters and give it copy activity sink like below.

enter image description here

enter image description here

My pipeline JSON:

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "Get Metadata1",
                "type": "GetMetadata",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "dataset": {
                        "referenceName": "sourcetxt",
                        "type": "DatasetReference"
                    },
                    "fieldList": [
                        "childItems"
                    ],
                    "storeSettings": {
                        "type": "AzureBlobFSReadSettings",
                        "enablePartitionDiscovery": false
                    },
                    "formatSettings": {
                        "type": "DelimitedTextReadSettings"
                    }
                }
            },
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Get Metadata1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@activity('Get Metadata1').output.childItems",
                        "type": "Expression"
                    },
                    "isSequential": true,
                    "activities": [
                        {
                            "name": "Copy data1",
                            "type": "Copy",
                            "dependsOn": [
                                {
                                    "activity": "Set variable1",
                                    "dependencyConditions": [
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy": {
                                "timeout": "0.12:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "source": {
                                    "type": "DelimitedTextSource",
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "recursive": true,
                                        "wildcardFolderPath": "*",
                                        "wildcardFileName": {
                                            "value": "@item().name",
                                            "type": "Expression"
                                        },
                                        "enablePartitionDiscovery": false
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextReadSettings"
                                    }
                                },
                                "sink": {
                                    "type": "DelimitedTextSink",
                                    "storeSettings": {
                                        "type": "AzureBlobFSWriteSettings"
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextWriteSettings",
                                        "quoteAllText": true,
                                        "fileExtension": ".txt"
                                    }
                                },
                                "enableStaging": false,
                                "translator": {
                                    "type": "TabularTranslator",
                                    "typeConversion": true,
                                    "typeConversionSettings": {
                                        "allowDataTruncation": true,
                                        "treatBooleanAsNumber": false
                                    }
                                }
                            },
                            "inputs": [
                                {
                                    "referenceName": "sourcetxt",
                                    "type": "DatasetReference"
                                }
                            ],
                            "outputs": [
                                {
                                    "referenceName": "targettxts",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "folder_name": {
                                            "value": "@variables('folder_name')",
                                            "type": "Expression"
                                        },
                                        "file_name": {
                                            "value": "@item().name",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        },
                        {
                            "name": "Set variable1",
                            "type": "SetVariable",
                            "dependsOn": [],
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "folder_name",
                                "value": {
                                    "value": "@split(item().name,'_')[0]",
                                    "type": "Expression"
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "variables": {
            "folder_name": {
                "type": "String"
            }
        },
        "annotations": []
    }
}

Result:

enter image description here

  • Related