PixelVAE: A Latent Variable Model for Natural Images. Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64x64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.
Keywords for this software
References in zbMATH (referenced in 4 articles )
Showing results 1 to 4 of 4.
- Marino, Joseph: Predictive coding, variational autoencoders, and biological connections (2022)
- Akuzawa, Kei; Iwasawa, Yusuke; Matsuo, Yutaka: Information-theoretic regularization for learning global features by sequential VAE (2021)
- Tonolini, Francesco; Radford, Jack; Turpin, Alex; Faccio, Daniele; Murray-Smith, Roderick: Variational inference for computational imaging inverse problems (2020)
- Mansbridge, Alex; Fierimonte, Roberto; Feige, Ilya; Barber, David: Improving latent variable descriptiveness by modelling rather than ad-hoc factors (2019)