I am doing a classification task of MFCC (time-series data) using LSTM.
I have input (16,60,40) (Batch,step,features)
class model(nn.Module):
def __init__(self,ninp,num_layers,class_num,nhid=128):
super().__init__()
self.lstm_nets = nn.LSTM(input_size=ninp,hidden_size=nhid,num_layers=num_layers,
batch_first=True,dropout=0.2,bidirectional=False)
self.FC = nn.Linear(nhid,class_num)
self.tanh = nn.Tanh()
self.softmax = nn.LogSoftmax(1)
def forward(self,X):
device = 'cuda:0'
out, (ht, ct) = self.lstm_nets(X)
# out = ht.contiguous().view(16,-1)
out = self.tanh(out)
out = self.FC(out)
Out = self.softmax(out)
return Out
model = model(ninp=X.shape[2],num_layers=1,class_num=32,nhid=128)
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.5e-4)
If I have out = ht.contiguous().view(16,-1)
that flattens the LSTM
output, I got error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-96-a7e2ba68dcd9> in <module>()
11
12 optimizer.zero_grad()
---> 13 y_pred = model(X)
14 # calculate loss function
15 loss = loss_function(y_pred, y)
3 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
101
102 def forward(self, input: Tensor) -> Tensor:
--> 103 return F.linear(input, self.weight, self.bias)
104
105 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x32 and 128x32)
If I have out = out.contiguous().view(16,-1) that flattens the LSTM output, I got error RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x7680 and 128x32)
If I remove the Flatten Step
, I got such an error. RuntimeError: Expected target size [16, 32], got [16]
In addition, I found examples online do not flatten the output of LSTM
Thanks for any help.
CodePudding user response:
In each timestep of an LSTM the input goes through a simple neural network and the output gets passed to the next timestep.
The output out
of function
out, (ht, ct) = self.lstm_nets(X)
contains a list of ALL outputs (i.e the output of the neural networks of every timestep). Yet, in classification, you mostly only really care about the LAST output. You can get it like this:
out = out[:, -1]
This output has now the shape (hidden-size, 1)
.
So in your case your forward function should look like this:
def forward(self,X):
device = 'cuda:0'
out, (ht, ct) = self.lstm_nets(X)
out = out[: ,-1]
out = self.tanh(out)
out = self.FC(out)
Out = self.softmax(out)
return Out