I need some help to transform xslx, csv which are located in azure blob storage (remove some row, rename some column name), and save it in adls. Could someone give me some idea or steps on how to complete this task, that would really be helpful. Thanks for any help, really appreciate
P.S : I am complete fresher in cloud switched from development to cloud recently, have some basic ideas on adf, pipelines, activity, blob and some basic stuff.
CodePudding user response:
Storing xslx file in Azure Data Lake Storage using Azure Data Factory is not possible.
Workaround can be storing it using Python code. Python Code is given in this
Step2: Select Data flow.
Step3: Add Source( CSV in Blob Storage) and Sink(Azure Data Lake Storage)
Step4: I have taken sample CSV file as shown below. There are 2 columns TestCol1 and TestCol2.
Step5: To rename these 2 columns, I have used select statement in dataflow. Here I have renamed 2 columns.
As shown in above screenshots, you can rename columns.
Step6: Now you can run pipeline and store data to Azure Data Lake Storage.
Also there are number of options to transform CSV file. As shown in below screenshot.
For more data transformation ideas you can follow these 2 links –