Home > database >  Looking for a better formula in Excel to Calculate Duplicates
Looking for a better formula in Excel to Calculate Duplicates

Time:12-02

I have 621224 * 1 data in my Excel Sheet and have to calculate total no. of duplicates in the sheet but it takes a lot too much time with =IF(COUNTIF($J$1000:J14353,J5353)>1,1,0) so this formula might be taking n^2 complexity to find duplicates, I am looking for a formula that takes less time and if possible takes nlogn time, if there is in Excel

As of now I am doing this task manually taking a range of 10k which works in acceptable time and also to add on I have sorted the list I searched for vlookup and found it will take around same time as countif

CodePudding user response:

If you've sorted the data then you can use binary searching, which will be many times faster than your current linear set-up with COUNTIFS. For example:

=SUMPRODUCT(N(MATCH(A1:A750000,A:A)<>ROW(A1:A750000)))

On my machine, this produced a result in less than 1 second.

If you aren't able to first sort the data, if you have Office 365 you can perform the sorting in-formula:

=LET(ζ,A1:A750000,ξ,SORT(ζ),SUMPRODUCT(N(MATCH(ξ,ξ)<>SEQUENCE(ROWS(ζ)))))

which should still be very fast.

CodePudding user response:

Ignoring IF() may faster. Try-

=SUM(--(COUNTIFS(A3:A13,A3:A13)>1))

enter image description here

  • Related