Key points are not available for this paper at this time.
The unconditional generation of high fidelity images is a longstanding for testing the performance of image decoders. Autoregressive image have been able to generate small images unconditionally, but the of these methods to large images where fidelity can be more readily has remained an open problem. Among the major challenges are the to encode the vast previous context and the sheer difficulty of a distribution that preserves both global semantic coherence and of detail. To address the former challenge, we propose the Subscale Network (SPN), a conditional decoder architecture that generates an image a sequence of sub-images of equal size. The SPN compactly captures-wide spatial dependencies and requires a fraction of the memory and the required by other fully autoregressive models. To address the challenge, we propose to use Multidimensional Upscaling to grow an image both size and depth via intermediate stages utilising distinct SPNs. We SPNs on the unconditional generation of CelebAHQ of size 256 and of from size 32 to 256. We achieve state-of-the-art likelihood results in settings, set up new benchmark results in previously unexplored and are able to generate very high fidelity large scale samples on the of both datasets.
Menick et al. (Tue,) studied this question.