Home > OS >  Better sharding function for int64 keys in golang?
Better sharding function for int64 keys in golang?

Time:01-13

I am using concurrent map from this repo where it is possible to select the key type when creating a map using NewWithCustomShardingFunction. I just need to provide my own sharding function for int64 keys and that's what I am using here.

I am also using latest version of Go where I can use generics so I decided to use concurrent-map with keys as int64 by implementing my own sharding function.

import (
    cmap "github.com/orcaman/concurrent-map/v2"
)

func shardingFunc(key int64) uint32 {
    return uint32(key) // TODO - create a better sharding function that does not rely on how uint32 type conversion works
}

func main() {
    testMap := cmap.NewWithCustomShardingFunction[int64, *definitions.CustomerProduct](shardingFunc)
    // ... use the map ...
}

I wanted to know whether my sharding function is okay here for int64 keys or should I have better sharding function? I don't want to be in situation where it can give me index out of range error or any other issues.

CodePudding user response:

The sharding function is a hash function. The function should uniformly distribute the keys over a 32 bit space.

If the low four bytes if your init64 values are uniformly distributed, then uint32(key) will work as a sharding function.

An example of where uint32(key) is a bad choice is where the low bytes have constant values. For example, if the key values are like 0x00010000, 0x00020000, ..., then uint32(key) will evaluate to zero. This is not a uniform distribution.

If you don't know how your int64 keys are distributed, then it's best to use all of the key's bits in the shard function. Here's one that uses XOR:

func shardingFunc(key int64) uint32 {
    return uint32(key) ^ uint32(key >> 32) 
}

CodePudding user response:

For something robust, use crypto/sha256 to hash the key then convert (some of) it to unint32:

func shardingFunc(key int64) uint32 {
    bytes := sha256.Sum256(new(big.Int).SetInt64(key).Bytes())
    return binary.BigEndian.Uint32(bytes[0:32])
}

There are more efficient, although more verbose, ways of coding this, but hopefully you get the idea.

  • Related