I am making a network which is similar to SE-Net(https://github.com/titu1994/keras-squeeze-excite-network/blob/master/se.py) using keras, but quite different with it.
Suppose that I want to make some layer sequence like :
import keras
Input = keras.model.Input((None,None,3))
x1 = keras.layers.Conv2d(filters = 32, kernel_size = (3,3))(Input)
x_gp = keras.layers.GlobalAveragePooling()(x1)
x2 = keras.layers.Conv2d(filters = 32, kernel_size = (1,1))(x_gp)
x3 = keras.layers.Conv2d(filters = 8, kernel_size = (1,1))(x2)
x2_ = keras.layers.Conv2d(filters = 32, kernel_size = (1,1))(x3)
x_se = keras.activation.sigmoid()(x2_)
I want to know that applying x_se like this is programmable. Please tell me if I am doing wrong.
CodePudding user response:
you can for sure experiment sigmoid as an activation for cnn layers too but the reason why sigmoid is not used with cnn layers are:
1. Sigmoid function is monotonic but it's derivative is not therefore there is a possibility that your training can be stuck
2. Sigmoid range:[0,1]
if you are experimenting sigmoid with cnn layers then I would suggest you to use it only for few layers. You can give swish a try.