Home > Software engineering >  How to classify images with Variational Autoencoder
How to classify images with Variational Autoencoder

Time:12-30

I have trained an autoencoder in both labeled images (1200) and unlabeled images (4000) and I have both models saved separately (vae_fake_img and vae_real_img). So I was wondering what to do next. I know Variational Autoencoders are not useful for a classification task but feature extraction seems like a good try. So here are my attempts:

  1. Labeled my unlabeled data using k-means clustering from the labeled images latent space.
  2. My supervisor suggested training the unlabeled images on the VAE, then visualize the latent space with t-SNE, then K-means clustering, then MLP for final prediction.
  3. I want to train a Conditional VAE to create more labeled samples and retrain the VAE and use the reconstruction (64,64,3) output and using the last three fully connected (FC) layers of VGGNet16 architecture for final classification as done in this paper Encoder as feature extraction paper.

I have tried so many methods for my thesis and I really need to achieve high accuracy if I want to get a job in my current internship. So any suggestion or guidance is highly appreciated. I've read so many Autoencoder papers but the architecture for classification is not fully explained (or Im not understanding properly), I want to know which part of the VAE holds more information for multiclassification as I believe that the latent space of the encoder has more useful information than the decoder reconstruction. I want to know which part of the autoencoder has better feature extraction for a final classification.

CodePudding user response:

in case of Autoencoders yoh don't need labels for reconstructing input data. So I think these approaches might make slight improvements:

  • Use VAE(Variational Auto Encoder) instead of AE
  • Use Conditional VAE(CVAE) and the combine all the data and train the network feeding all of data into that.
  • consider Batch as condition, for labeled and unlabeled data and use onehot of batch of data as its condition.
  • Inject the condition to Encoder and Decoder
  • Then the latent space won't have any batch effect and you can use KNN to get the label of nearest labeled data for unlabeled ones.
  • Alternatively you can train a somple MLP to classify every sample of your latent space. (in this approach you should train the MLP only with labeled data and then test it on unlabeled data)

don't forget Batch normalization and drop out layers p.s., the most meaningful layer of an AE is the latent space.

  • Related