Some time back we implemented a slight variation of this Optimizing Protected Indexes approach from SQL Server Central to indexing encrypted data in the ufn_GroupData sample function:
CREATE FUNCTION [dbo].[fncGetBucket]
(
@String NVARCHAR(3500)
)
RETURNS INT
AS
BEGIN
DECLARE @Result INT = NULL;
IF @String IS NOT NULL
SET @Result = HASHBYTES('SHA2_256', @String) % 20000;
RETURN @Result;
END
A few million rows are happily stored in the database using the result of this UDF as a "bucketing" index.
Now we would like to be able to generate the matching bucket value in C# and the deceptively simple SQL above doesn't make it clear how to do that.
In fact, I've found it hard to get started. But I can say this code:
byte[] encoded = SHA256.Create().ComputeHash(Encoding.Unicode.GetBytes(str));
generates the same hash as HASHBYTES, so I'm past the first step.
So what SQL is actually doing with that modulo operator???
All help appreciated!
CodePudding user response:
HASHBYTES('SHA2_256')
returns a 32 byte ByteArray
What internally is happening is
cast(HASHBYTES('SHA2_256', @String) as int) % 20000
So there's an integer overflow here that is gracefully ignored by SQL Server.
This is undefined behaviour and not easily converted to C#.
You can achieve something similar by taking the last 4 bytes and modulo those.
This seems to work for your mod 20000.
byte[] encoded = SHA256.Create().ComputeHash(Encoding.UTF8.GetBytes("asdf"));
var last4Bytes = encoded.Skip(28).Take(4).Reverse().ToArray();
var myInt = BitConverter.ToInt32(last4Bytes);
var moduloResult = myInt % 20000;
This gives me the same result as
print HASHBYTES('SHA2_256', 'asdf') % 20000
12635