• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

Visual abilities of language models found to be lacking depth

Simon Osuji by Simon Osuji
July 12, 2024
in Artificial Intelligence
0
Visual abilities of language models found to be lacking depth
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


Visual abilities of language models found to be lacking depth
VLMs cannot reliably count the intersections between the blue and red plots. Credit: arXiv (2024). DOI: 10.48550/arxiv.2407.06581

A trio of computer scientists at Auburn University, in the U.S., working with a colleague from the University of Alberta, in Canada, has found that claims of visual skills by large language models (LLMs) with vision capabilities (VLMs) may be overstating abilities.

Related posts

Onnit’s Instant Melatonin Spray Keeps Bedtime Uncomplicated

Onnit’s Instant Melatonin Spray Keeps Bedtime Uncomplicated

January 31, 2026
How to Film ICE | WIRED

How to Film ICE | WIRED

January 31, 2026

Pooyan Rahmanzadehgervi, Logan Bolton, Anh Totti Nguyen and Mohammad Reza Taesiri have tested four of the most popular VLMs (GPT-4o, Gemini-1.5 Pro, Claude-3 Sonnet, and Claude-3.5 Sonnet) on their visual abilities. The research is posted to the arXiv preprint server.

As large language models have evolved over the past year, new features have been added, such as the ability to accept visual input. But such abilities have led to questions regarding the nature of visual ability in general.

As with animals, any human-built visual system must have two main components, a camera and a brain to process what is captured by the camera. In this new study, the researchers have found that while the camera that is used to capture visualization may be highly developed, the processing of the data that it produces is still in its early stages.

It is one thing to ask a language model to identify a building such as the Taj Mahal, quite another to ask it questions about the nature of things that are in the image. As an example, asking the language model to tell you how many children standing in front of the Taj Mahal are holding hands, is tricky because the language model has not been taught to count—it has been taught to recognize things like hand-holding.

Thus, unless it has been shown images of the same number of children holding hands as shown in the picture, it will have no way of giving a correct answer.

The researchers have demonstrated this lack of processing ability by asking four popular LLMs to do things that are very simple for people to do, like count how many circles in a picture are overlapping or how many rings are interconnected.

Unsurprisingly, all four of the LLMs performed poorly—they only did well when they had been trained with pictures showing something familiar. They had difficulty figuring out how many rings were interlocking when there were more than five of them, for example, because other than the Olympic rings, they had not seen such examples.

The work by the team on this effort shows that large language models have a long way to go before they are capable of processing visual information in ways that are on par with humans.

More information:
Pooyan Rahmanzadehgervi et al, Vision language models are blind, arXiv (2024). DOI: 10.48550/arxiv.2407.06581

Vision language models are blind: vlmsareblind.github.io/

Journal information:
arXiv

© 2024 Science X Network

Citation:
Visual abilities of language models found to be lacking depth (2024, July 12)
retrieved 12 July 2024
from https://techxplore.com/news/2024-07-visual-abilities-language-lacking-depth.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Sikorsky Wins $251M Black Hawk Helicopter Deal With Jordan, Croatia

Next Post

United Nation-hosted talks on local ceasefires to continue in Geneva

Next Post
United Nation-hosted talks on local ceasefires to continue in Geneva

United Nation-hosted talks on local ceasefires to continue in Geneva

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Crying in Apple Vision Pro Is No Laughing Matter

Crying in Apple Vision Pro Is No Laughing Matter

2 years ago
Top 10 countries with the highest infant mortality rates

Top 10 countries with the highest infant mortality rates

9 months ago
Zelensky Says Russia Planning Attacks on Ukraine Nuclear Plants

Zelensky Says Russia Planning Attacks on Ukraine Nuclear Plants

1 year ago
Top 5 countries Nigeria made the most money from as per current data

Top 5 countries Nigeria made the most money from as per current data

12 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.