A Confidence-Based Interface for Neuro-Symbolic Visual Question Answering

AAAI Workshop CLeaR 2022 · Thomas Eiter, Nelson Nicolas Higuera, Johannes Oetsch, Michael Pritz ·

We present a neuro-symbolic visual question answering (VQA) approach for the CLEVR dataset that is based on the combination of deep neural networks and answer-set programming (ASP), a logic-based paradigm for declarative problem solving. We provide a translation mechanism for the questions included in CLEVR to ASP programs. By exploiting choice rules, we consider deterministic and non-deterministic scene encodings. In addition, we introduce a confidence-based interface between the ASP module and the neural network which allows us to restrict the non-determinism to objects classified by the network with high confidence. Our experiments show that the non-deterministic scene encoding achieves good results even if the neural networks are trained rather poorly in comparison with the deterministic approach. This is important for building robust VQA systems if network predictions are less-than perfect.

PDF Abstract