Home > Blockchain >  How can I map/hash a tuple to a value within a specified range?
How can I map/hash a tuple to a value within a specified range?

Time:02-23

I am creating a sparse tensor in tensorflow that is about 4,000,000 X 56,000,000. The 56M columns are the interaction variables between about 10,600 possible values of a feature column, AKA, the combinations of all values.

Tensorflow's sparse tensor takes an indices argument which is a list of lists, where each sublist [x, y] denotes the row and column of a value within the sparse tensor.

I have the combinations of interaction variables:

combos = []
grouped_feature = df.groupby('feature')
for name, group in grouped_feature:
    combos.append([*combinations(group.feature.unique(), 2)])

This gives me a list of lists of tuples. The tuples in each sublist correspond to the combinations that should be 1 in my sparse tensor

Then I ran:

indices = []
for i in range(len(combos)):
    for j in range(len(combos[i])):
        indices.append([i, hash(combos[i][j])])

To get the proper list of lists format, but I need to change the hash function to map each combination to one of 56M values. How can I do this? Or is there a better way to do this? I could not find a built in method/function in tensorflow for populating sparse tensors

CodePudding user response:

You can take the hash mod the number of values in the range that you want to map to.

e.g.

NUM_VALUES = 56 * 10**6
indices = []
for i in range(len(combos)):
    for j in range(len(combos[i])):
        indices.append([i, hash(combos[i][j]) % NUM_VALUES])
  • Related