First of, I know there are similar solutions exists but this problem is somewhat different.
I have a process that produces multiple csv files based on user input 'n' (Where n > 1 and n <100). Means user can generate any number of files.
These files have same columns:
file1 -> Col1 Col2 Col3 Col4 Col5 output
file2 -> Col1 Col2 Col3 Col4 Col5 output
file3 -> Col1 Col2 Col3 Col4 Col5 output
These files are stored in azure blob with some datapath.
I want to read all the files and produce a result file like this:
Col1 Col2 Col3 Col4 Col5 output1 output2 output3
Is there any way of doing this dynamically. I.e without creating multiple sources in data flow and joining them because the files generated depends on the user and I cannot hardcode it.
CodePudding user response:
There are multiple steps to be followed in this solution process First we need to add the filePath as a column Next rank the data based in the filePath Implement pivot operation on the table. The implementation is based on three major steps.
- Source of the dataset.(list of csv files)
- Rank – Ranking rows on column
- Pivoting – Pivots row values into columns and groups columns and finally aggregate the data
CodePudding user response:
The solution I followed is here.