Dimensionality Reduction Autoencoder Pytorch-CodePudding

I'm trying to use the Autoencoder which code you can see below as a tool for Dimensionality Reduction, I was wondering how can I "extract" the hidden layer and use it for my purpose

My original Dataset went under Standard Scaling

Here I define a Dictionary to centralize the values

CONFIG = {
        'BATCH_SIZE' : 1024,
        'LR' : 1e-4,
        'WD' : 1e-8,
        'EPOCHS': 50
        }

Here I convert the values of my train and test dataframes into tensors

t_test = torch.FloatTensor(test.values)
t_train = torch.FloatTensor(train.values)

Here I create data loaders

loader_test = torch.utils.data.DataLoader(dataset = t_test,
                                 batch_size = CONFIG['BATCH_SIZE'],
                                     shuffle = True)

loader_train = torch.utils.data.DataLoader(dataset = t_train,
                                     batch_size = CONFIG['BATCH_SIZE'],
                                     shuffle = True)

Here I create the class AutoEncoder (AE)

class AE(torch.nn.Module):
    def __init__(self):
        super().__init__()
        
        self.encoder = torch.nn.Sequential(
            torch.nn.Linear(31,16),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 8),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 4),
        )
    
        self.decoder = torch.nn.Sequential(  
            torch.nn.Linear(4, 8),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 16),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 31),

        )
 
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

Here I define model loss_funcion and the optimizer

model = AE()

loss_function = torch.nn.MSELoss()

optimizer = torch.optim.Adam(model.parameters(),
                             lr = CONFIG['LR'],
                             weight_decay = CONFIG['WD'])

Here I compute the algorithm

epochs = CONFIG['EPOCHS']
dict_list = []
for epoch in range(epochs):
    for (ix, batch) in enumerate(loader_train):
        
        model.train()
        reconstructed = model(batch)
    
        loss = loss_function(reconstructed, batch)
    
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        temp_dict = {'Epoch':epoch,'Batch_N':ix,'Batch_L':batch.shape[0],'loss':loss.detach().numpy()}
        dict_list.append(temp_dict)
        
df_learning_o = pd.DataFrame(dict_list)

CodePudding user response：

You can simply return not just the decoded output, but also the encoded embedding layer, like this:

class AE(torch.nn.Module):
    def __init__(self):
        super().__init__()
        
        self.encoder = torch.nn.Sequential(
            torch.nn.Linear(31,16),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 8),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 4),
        )
    
        self.decoder = torch.nn.Sequential(  
            torch.nn.Linear(4, 8),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 16),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 31),

        )
 
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

When you pass something to your model (in the train loop for example), you would have to change it to the following:

encoded, reconstructed = model(batch)

Now you can do whatever you'd like with the encoded embedding, i.e. which is the dimensionally reduced input.