Generative Adversarial Transformers

1 Mar 2021  ·  Drew A. Hudson, C. Lawrence Zitnick ·

We introduce the GANformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linear efficiency, that can readily scale to high-resolution synthesis. It iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network. We demonstrate the model's strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and outdoor scenes, showing it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency. Further qualitative and quantitative experiments offer us an insight into the model's inner workings, revealing improved interpretability and stronger disentanglement, and illustrating the benefits and efficacy of our approach. An implementation of the model is available at https://github.com/dorarad/gansformer.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Generation Cityscapes VQGAN FID-10k-training-steps 173.7971 # 6
Image Generation Cityscapes StyleGAN2 FID-10k-training-steps 8.35 # 3
Image Generation Cityscapes GANformer FID-10k-training-steps 5.7589 # 2
Image Generation Cityscapes GAN FID-10k-training-steps 11.5652 # 4
Image Generation Cityscapes SAGAN FID-10k-training-steps 12.8077 # 5
Image Generation CLEVR GAN FID-5k-training-steps 25.0244 # 4
Image Generation CLEVR SAGAN FID-5k-training-steps 26.0433 # 5
Image Generation CLEVR VQGAN FID-5k-training-steps 32.6031 # 6
Image Generation CLEVR StyleGAN2 FID-5k-training-steps 16.0534 # 3
Image Generation CLEVR GANformer FID-5k-training-steps 9.1679 # 2
Image Generation FFHQ 256 x 256 GANFormer FID 7.42 # 20
Image Generation LSUN Bedroom 256 x 256 GANformer FID-10k-training-steps 6.5085 # 2
Image Generation LSUN Bedroom 256 x 256 VQGAN FID-10k-training-steps 59.6333 # 6
Image Generation LSUN Bedroom 256 x 256 StyleGAN2 FID-10k-training-steps 11.5255 # 3
Image Generation LSUN Bedroom 256 x 256 SAGAN FID-10k-training-steps 14.0595 # 5
Image Generation LSUN Bedroom 256 x 256 GAN FID-10k-training-steps 12.1567 # 4

Methods