Home > Software engineering >  Azure Data Factory Data Flow: how to filter input column with multiple values
Azure Data Factory Data Flow: how to filter input column with multiple values

Time:12-18

What I am trying to do is basically something that emulates SQL WHERE IN CLAUSE in a data flow. I want to pass comma separated string of values into my data flow, i.e ptf_proc_link_id = "A, B, C".

enter image description here

Inside of the data flow, I want to use those values, to filter one of my input columns:

enter image description here

It would be equal to writing in SQL SELECT * FROM my_delta_table where ptf_proc_link_id in (A, B, C).

However, data flow documentation only hints that I can filter single values i.e ptf_proc_link_id = A. I have tried to use intersect, but then both parameters needs to be arrays, and in my case one of them is a column. I could do something like ptf_proc_link_id = A OR ptf_proc_link_id = B OR ptf_proc_link_id = C, but I never know how many input values I would have, so I cannot hardcode it.

It seems like a very simple use case for a data flow, so I would be really thankful if someone could explain me what the correct approach would be.

Thank you!

CodePudding user response:

You can use in() function in ADF data flow to filter multiple values.

Syntax: in(array of items,item to find)

  • I tried to repro this with sample input data.
key class name mark DOB
1 1 Arjuna 50 1/2/2015
2 1 Basanta 47 5/1/2015
3 1 Subala 54 5/6/2015
4 2 Gandharva 60 1/2/2014
5 2 Ujjvala 55 9/2/2014
6 2 Sanadhana 64 1/12/2014
7 3 Sridama 75 1/2/2013
8 3 Sudama 80 13/12/2013
9 3 Vasu 81 1/12/2013

enter image description here img:1 Source data preview.

  • I tried to filter the records with key=2,4,6

  • Filter condition can be given as in(['2','4','6'], key) or in(array('2','4','6'), key) enter image description here img:2 Filter condition settings

  • Output data of the filter Transformation:

enter image description here img:3 Filter Transformation data preview

  • You can also create a dataflow parameter of array type and give that parameter in the filter condition of filter transformation. enter image description here img: 4 dataflow parameters

  • Filter condition while using parameter will be in($parameter1,key)

Reference: Microsoft document on data flow expression - in

  • Related