Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models
Published in NeurIPS (2024) Track on Datasets and Benchmarks., 2024
Recommended citation: Hemmat, Arshia, et al. "Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models." The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://arxiv.org/abs/2411.06287
Abstract: Despite the importance of shape perception in human vision, early neural image classifiers relied less on shape information for object recognition than other (often spurious) features. While recent research suggests that current large VisionLanguage Models (VLMs) exhibit more reliance on shape, we find them to still be seriously limited in this regard. To quantify such limitations, we introduce IllusionBench, a dataset that challenges current cutting-edge VLMs to decipher shape information when the shape is represented by an arrangement of visual elements in a scene. Our extensive evaluations reveal that, while these shapes are easily detectable by human annotators, current VLMs struggle to recognize them, indicating important avenues for future work in developing more robust visual perception systems. The full dataset and codebase are available at: https://arshiahemmat.github.io/illusionbench/
Recommended (bib) citation:
@inproceedings{hemmathidden, title={Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models}, author={Hemmat, Arshia and Davies, Adam and Lamb, Tom A and Yuan, Jianhao and Torr, Philip and Khakzar, Ashkan and Pinto, Francesco}, booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track} }