I am having the following tabular data stored in a dataframe df
:
input3 | input2 | score |
---|---|---|
aaaaaa | xxxxxx | 0.1. |
... | ... | ... |
bbbbbb | yyyyyy | 0.1. |
I want to build a regression model on that using TF functional API. Because of the strings, I am using Embedding layers. Here is the network:
input1 = Input(shape=(1,), name="input1")
embedding1 = Embedding(n_input1, 5)(input1)
vec1 = Flatten()(embedding1)
# creating user embedding path
input2 = Input(shape=(1,), name="input2")
embedding2 = Embedding(n_input2, 5)(input2)
vec2 = Flatten()(embedding2)
# concatenate features
conc = Concatenate()([vec1, vec2])
# add fully-connected-layers
fc1 = Dense(256, activation='relu')(conc)
fc2 = Dense(128, activation='relu')(fc1)
fc3 = Dense(128, activation='relu')(fc2)
out = Dense(1)(fc3)
# Create model and compile it
model = Model([input1, input2], out)
model.compile('adam', 'mean_squared_error')
where n_input_1
and n_input_2
are the number of unique items in each columns.
Because, I have df.dtypes
returning:
input1 object
input2 object
score float64
dtype: object
I do df = data_df.astype({'input1': 'string', 'input2': 'string'})
-- not sure this is useful
When trying to fit the model using:
history = model.fit([df.input1, df.input2], df.score, epochs=10, verbose=1)
I end up with the following error:
UnimplementedError: Graph execution error:
Detected at node 'model/Cast' defined at (most recent call last):
...
File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 671, in _conform_to_reference_input
tensor = tf.cast(tensor, dtype=ref_input.dtype)
Node: 'model/Cast'
2 root error(s) found.
(0) UNIMPLEMENTED: Cast string to float is not supported
[[{{node model/Cast}}]]
(1) CANCELLED: Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_965]
Not really sure what I missed here ?
CodePudding user response:
Check documentation:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding
As it says:
This layer can only be used on positive integer inputs of a fixed range. The tf.keras.layers.TextVectorization
, tf.keras.layers.StringLookup
, and tf.keras.layers.IntegerLookup
preprocessing layers can help prepare inputs for an Embedding layer.
Example:
[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]