I have 621224 * 1 data in my Excel Sheet and have to calculate total no. of duplicates in the sheet but it takes a lot too much time with =IF(COUNTIF($J$1000:J14353,J5353)>1,1,0)
so this formula might be taking n^2 complexity to find duplicates, I am looking for a formula that takes less time and if possible takes nlogn time, if there is in Excel
As of now I am doing this task manually taking a range of 10k which works in acceptable time and also to add on I have sorted the list I searched for vlookup and found it will take around same time as countif
CodePudding user response:
If you've sorted the data then you can use binary searching, which will be many times faster than your current linear set-up with COUNTIFS
. For example:
=SUMPRODUCT(N(MATCH(A1:A750000,A:A)<>ROW(A1:A750000)))
On my machine, this produced a result in less than 1 second.
If you aren't able to first sort the data, if you have Office 365
you can perform the sorting in-formula:
=LET(ζ,A1:A750000,ξ,SORT(ζ),SUMPRODUCT(N(MATCH(ξ,ξ)<>SEQUENCE(ROWS(ζ)))))
which should still be very fast.
CodePudding user response:
Ignoring IF()
may faster. Try-
=SUM(--(COUNTIFS(A3:A13,A3:A13)>1))