How to convert this CSV file input to the given output in ADLS using ADF?
Input data:
order_id,city,country
L10,Sydney,Australia
L11,Annecy,France
L12,Montceau,France
L13,Paris,France
L14,Montceau,Canada
L15,Ste-Hyacinthe,Canada
Output data:
COUNTRY,CITY,TOTAL_Order
Australia,Sydney,1
Australia,Total,1
Canada,Montréal,1
Canada,Ste-Hyacinthe,1
Canada,Total,2
France,Annecy,1
France,Montceau,1
France,Paris,1
France,Total,3
Total,Total,6
I want to find the count of order ids city wise and country wise using Data Flow. This is similar to roll-up aggregation.
CodePudding user response:
Take three aggregate transforms in dataflow to do this. First is to calculate the count of
orderid
for every country and city combination. Second aggregate transform is to calculate the count oforderid
for every country. Third aggregate transform is to calculate the countorderid
for the full table. Below are the detailed steps.Same input data is taken as source.
img:1 source data preview
- Create two new additional branches by clicking symbol near to Source transformation and click new branch.
- In each branch add aggregate transformation.
- Aggregate transformation1 settings:
group by : country, city
aggregates: total_order=count(order_id)
img:2 aggregate transform1 data preview
- Aggregate transorm2 settings:
group by: country
aggregates: total_order=count(order_id)
img:3 aggregate transform 2 data preview.
- Aggregate transorm3 settings: No column in group by.
group by:
aggregates: total_order=count(order_id)
img:4 aggregate transform3 data preview.
- Next step is to union all these tables. Since all of these are not in the same structure, Add derived columns transformation to aggregate2 and aggregate3 and create columns with empty string is added.
- Join aggregate1,derived1 and derived2 transformations data using Union transformation.
img:5 Data preview after all transformations.
img: 6 Complete dataflow with all transformations.