Convert Tensor of hex strings to int-CodePudding

I have a dataset of tf.RaggedTensors with strings representing hexadecimal numbers that look like this:

[
 [[b'F6EE', b'BFED', b'4EEA', b'00EE', b'77AE', b'1FBE', b'1A6E',
   b'5AEB', b'6A0E', b'212F'],
    ...
  [b'FFEE', b'FFED', b'FEED', b'FDEE', b'FAAE', b'FFBE', b'FA8E',
   b'FAEB', b'FA0E', b'E12F']],

  ...

 [[b'FFEE', b'FFED', b'FEED', b'FDEE', b'FAAE', b'FFBE', b'FA8E',
   b'FAEB', b'FA0E', b'E12F'],
    ...
  [b'B6EE', b'BFED', b'4EEA', b'00EE', b'77AE', b'1FBE', b'1A6E',
   b'5AEB', b'6A0E', b'212F']]
]

I want to convert it into Tensor of int values, but tf.strings.to_number(tensor, tf.int32) doesn't have an option to specify the base as base16. Are there any alternatives?

Dataset contains tf.RaggedTensors, but the target shape is (batch_size, 100, 10). I guess this could be helpful if we were to make a custom function for this.

CodePudding user response：

I think you're looking for something like this.

I first create an example tensor with 3D shape, as the one that you have.

import tensorflow as tf

>> a = tf.convert_to_tensor(['F6EE', 'BFED', '4EEA', '00EE', '77AE', '1FBE', '1A6E',
   '5AEB', '6A0E', '212F'])
>> b = tf.convert_to_tensor(['FFEE', 'FFED', 'FEED', 'FDEE', 'FAAE', 'FFBE', 'FA8E',
   'FAEB', 'FA0E', 'E12F'])
>> tensor = tf.ragged.stack([[a, b]]).to_tensor()

tf.Tensor(
[[[b'F6EE' b'BFED' b'4EEA' b'00EE' b'77AE' b'1FBE' b'1A6E' b'5AEB'
   b'6A0E' b'212F']
  [b'FFEE' b'FFED' b'FEED' b'FDEE' b'FAAE' b'FFBE' b'FA8E' b'FAEB'
   b'FA0E' b'E12F']]], shape=(1, 2, 10), dtype=string)

Then, based on this answer, I created a custom function that I map to each value of the tensor in order to apply a transformation, in this case a cast.

def my_cast(t):
    val = tf.keras.backend.get_value(t)
    return int(val, 16)

shape = tf.shape(tensor)
elems = tf.reshape(tensor, [-1])

res = tf.map_fn(fn=lambda t: my_cast(t), elems=elems, fn_output_signature=tf.int32)
res = tf.reshape(res, shape)

print(res)

The output is the tensor:

tf.Tensor(
    [[[63214 49133 20202   238 30638  8126  6766 23275 27150  8495]
      [65518 65517 65261 65006 64174 65470 64142 64235 64014 57647]]], 
    shape=(1, 2, 10), 
    dtype=int32
)

Adding fn_output_signature=tf.int32 to tf.map_fn is important because it lets you obtain a tensor with a different type with respect to the input tensor.