I have two csv files and both files have records. I want to delete duplicate records. I want to get unique records. How can I do it with Apache Nifi?
Thank you !
input1.csv ;
id,surname,name
1,ali,veli
2,mert,tolga
input2.csv ;
id,surname,name
1,ali,veli
3,ahmet,ozan
output.csv ;
id,surname,name
1,ali,veli
2,mert,ayşe
3,ahmet,ozan
CodePudding user response:
You can do this by doing Record based processing and combine the MergeRecord to merge the two csv files into one and then you can use QueryRecord processor for deduplication with query like:
SELECT * FROM FLOWFILE
INTERSECT
SELECT * FROM FLOWFILE
SELECT DISTINCT FROM FLOWFILE will not work. Here are Calcite docs
The output: