1 code implementation • 15 Feb 2024 • Jiaxin Zhang, Zhongzhi Li, Mingliang Zhang, Fei Yin, ChengLin Liu, Yashar Moshfeghi
To address this gap, we introduce the GeoEval benchmark, a comprehensive collection that includes a main subset of 2, 000 problems, a 750 problems subset focusing on backward reasoning, an augmented subset of 2, 000 problems, and a hard subset of 300 problems.