Home > Back-end >  Apache POI : Duplicate detection of rows using multiple columns
Apache POI : Duplicate detection of rows using multiple columns

Time:08-12

I am trying to highlight duplicate rows in excel for the subset of columns using java. I have created a conditional format rule and applied the COUNTIFS formula for identifying duplicates.

The formula works for the first two subset columns I4:J7. If I extend the formula to the next column K7, duplicate- detection is not working.

        SheetConditionalFormatting sheetCF = sheet.getSheetConditionalFormatting();
        
                        // Working
        ConditionalFormattingRule rule1 = sheetCF.createConditionalFormattingRule("=COUNTIFS(I$4:I$7,I$4:I$7,J$4:J$7,J$4:J$7)>1");
        
                        // Not Working
        //ConditionalFormattingRule rule1 = sheetCF.createConditionalFormattingRule("=COUNTIFS(I$4:I$7,I$4:I$7,J$4:J$7,J$4:J$7,K$4:K$7,\"*\"&K$4:K$7&\"*\")>1");

        org.apache.poi.ss.usermodel.FontFormatting font = rule1.createFontFormatting();          
        font.setFontStyle(false, true);
        font.setFontColorIndex(IndexedColors.BLUE.index);

        CellRangeAddress[] regions = new CellRangeAddress[]{CellRangeAddress.valueOf("I4:I7")};
        sheetCF.addConditionalFormatting(regions, rule1); 

Excel enter image description here

CodePudding user response:

This is rather an Excel problem than a problem of Apache POI.

The formula =COUNTIFS(I$4:I$7,I$4:I$7,J$4:J$7,J$4:J$7,K$4:K$7,"*"&K$4:K$7&"*")>1 works in a worksheet of Excel 365 and uses spilling array behavior and automatically spreads over a range of four rows. But it does not work as formula in conditional formatting. I guess this is because of problems using the spilling array behavior while concatenating text using & operator.

The same works using following formula =COUNTIFS(I$4:I$7,I4,J$4:J$7,J4,K$4:K$7,"*"&K4&"*")>1 in one cell and filled down then. There the rows in the references I4, J4 and K4 count up while filling down because of not using $ in those references.

This formula also works in conditional formatting. At least using Office Open XML format *.xlsx.

So try:

ConditionalFormattingRule rule1 = sheetCF.createConditionalFormattingRule("COUNTIFS(I$4:I$7,I4,J$4:J$7,J4,K$4:K$7,\"*\"&K4&\"*\")>1");

If the need is to highlight the duplicates in columns I:K instead of only in column I, then the conditional formatting needs to be applied to all the columns I4:K7 instead of only I4:I7. But then the formula must also fixate the columns using $ too: COUNTIFS($I$4:$I$7,$I4,$J$4:$J$7,$J4,$K$4:$K$7,"*"&$K4&"*")>1. Else the column letters would count up for columns right from the first one.

...
ConditionalFormattingRule rule1 = sheetCF.createConditionalFormattingRule("COUNTIFS($I$4:$I$7,$I4,$J$4:$J$7,$J4,$K$4:$K$7,\"*\"&$K4&\"*\")>1");
...
CellRangeAddress[] regions = new CellRangeAddress[]{CellRangeAddress.valueOf("I4:K7")};
...
  • Related