The dataset is based on the original MNIST dataset. Compared to the original dataset, the digits are scaled down by a factor of $0.75$ such that there is more space for the random translation.The PolyMNIST consists of 5 different modalities.
The background of every modality $\mathbf{x}_m$ consists of random patches of size $28 \times 28$ from a large image. And the digit is placed at a random position of the patch. Using this setup, every modality has modality-specific information given by its background image and shared information given by the digit, which is shared between all modalities. An additional difficulty compared to the original PolyMNIST is the random translation of the digits
Paper | Code | Results | Date | Stars |
---|