How can I map/hash a tuple to a value within a specified range?-CodePudding

I am creating a sparse tensor in tensorflow that is about 4,000,000 X 56,000,000. The 56M columns are the interaction variables between about 10,600 possible values of a feature column, AKA, the combinations of all values.

Tensorflow's sparse tensor takes an indices argument which is a list of lists, where each sublist [x, y] denotes the row and column of a value within the sparse tensor.

I have the combinations of interaction variables:

combos = []
grouped_feature = df.groupby('feature')
for name, group in grouped_feature:
    combos.append([*combinations(group.feature.unique(), 2)])

This gives me a list of lists of tuples. The tuples in each sublist correspond to the combinations that should be 1 in my sparse tensor

Then I ran:

indices = []
for i in range(len(combos)):
    for j in range(len(combos[i])):
        indices.append([i, hash(combos[i][j])])

To get the proper list of lists format, but I need to change the hash function to map each combination to one of 56M values. How can I do this? Or is there a better way to do this? I could not find a built in method/function in tensorflow for populating sparse tensors

CodePudding user response：

You can take the hash mod the number of values in the range that you want to map to.

e.g.

NUM_VALUES = 56 * 10**6
indices = []
for i in range(len(combos)):
    for j in range(len(combos[i])):
        indices.append([i, hash(combos[i][j]) % NUM_VALUES])