I have a set of excel files inside ADLS. The format looks similar to the one below:
The first 4 rows would always be the document header information and the last 3 will be 2 empty rows and the end of the document indicator. The number of rows for the employee information is indefinite. I would like to delete the first 4 rows and the last 3 rows using ADF.
Can any help me with what should be expressions in the Derived column / Select?
CodePudding user response:
My Excel file:
Source Data set settings (give A5 in range and select first row as header): Next, add a filter transformation with below expression
!startsWith(sno,'dummy') && sno!=''
this will filter out the rows starting with dummy, in your case, end of document. Also we are ignoring the empty rows by checking sno!=''
Final Preview after filter: