I have seen code with model([states, moves])
and with model.predict([states, moves])
. I think both of them are Q-learning. But when I exchange model([states, moves])
with model.predict([states, moves])
it takes a mad amount of time. Both give values back.
PS! I do get negative values, is that acceptable?
CodePudding user response:
Most of your question can be answered from this StackOverflow question. The answers there go over the minute differences between model()
and model.predict
.
As regard to negative values: take a look at Q-learning's definition. Depending on your problem space, you may want to allow negative values. Q-values should represent a discounted reward for taking an action a
in a specific state s
. If you have negative rewards, then you might want to allow negative values. If not, you should probably expect all non-negative values.
When it comes to your architecture, that's another place to check. Your final layer's activation will determine whether or not negative values are possible at all. Activations such as the "linear" activation (or no activation in some packages) will allow all real values, whereas "relu" only allow non-negative values.
FYI, the code model()
is considered model
's __call__()
function: class objects, when passed parameters directly, will invoke Python's __call__
function. Keras/Tensorflow here is just overriding it to call the model's forward pass.