Wednesday, June 4, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Visual abilities of language models found to be lacking depth

Simon Osuji by Simon Osuji
July 12, 2024
in Artificial Intelligence
0
Visual abilities of language models found to be lacking depth
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Visual abilities of language models found to be lacking depth
VLMs cannot reliably count the intersections between the blue and red plots. Credit: arXiv (2024). DOI: 10.48550/arxiv.2407.06581

A trio of computer scientists at Auburn University, in the U.S., working with a colleague from the University of Alberta, in Canada, has found that claims of visual skills by large language models (LLMs) with vision capabilities (VLMs) may be overstating abilities.

Related posts

AI enables shift from enablement to strategic leadership

AI enables shift from enablement to strategic leadership

June 3, 2025
Exploring the real reasons why some people choose not to use AI

Exploring the real reasons why some people choose not to use AI

June 3, 2025

Pooyan Rahmanzadehgervi, Logan Bolton, Anh Totti Nguyen and Mohammad Reza Taesiri have tested four of the most popular VLMs (GPT-4o, Gemini-1.5 Pro, Claude-3 Sonnet, and Claude-3.5 Sonnet) on their visual abilities. The research is posted to the arXiv preprint server.

As large language models have evolved over the past year, new features have been added, such as the ability to accept visual input. But such abilities have led to questions regarding the nature of visual ability in general.

As with animals, any human-built visual system must have two main components, a camera and a brain to process what is captured by the camera. In this new study, the researchers have found that while the camera that is used to capture visualization may be highly developed, the processing of the data that it produces is still in its early stages.

It is one thing to ask a language model to identify a building such as the Taj Mahal, quite another to ask it questions about the nature of things that are in the image. As an example, asking the language model to tell you how many children standing in front of the Taj Mahal are holding hands, is tricky because the language model has not been taught to count—it has been taught to recognize things like hand-holding.

Thus, unless it has been shown images of the same number of children holding hands as shown in the picture, it will have no way of giving a correct answer.

The researchers have demonstrated this lack of processing ability by asking four popular LLMs to do things that are very simple for people to do, like count how many circles in a picture are overlapping or how many rings are interconnected.

Unsurprisingly, all four of the LLMs performed poorly—they only did well when they had been trained with pictures showing something familiar. They had difficulty figuring out how many rings were interlocking when there were more than five of them, for example, because other than the Olympic rings, they had not seen such examples.

The work by the team on this effort shows that large language models have a long way to go before they are capable of processing visual information in ways that are on par with humans.

More information:
Pooyan Rahmanzadehgervi et al, Vision language models are blind, arXiv (2024). DOI: 10.48550/arxiv.2407.06581

Vision language models are blind: vlmsareblind.github.io/

Journal information:
arXiv

© 2024 Science X Network

Citation:
Visual abilities of language models found to be lacking depth (2024, July 12)
retrieved 12 July 2024
from https://techxplore.com/news/2024-07-visual-abilities-language-lacking-depth.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Sikorsky Wins $251M Black Hawk Helicopter Deal With Jordan, Croatia

Next Post

United Nation-hosted talks on local ceasefires to continue in Geneva

Next Post
United Nation-hosted talks on local ceasefires to continue in Geneva

United Nation-hosted talks on local ceasefires to continue in Geneva

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Robinhood Onboards 14 Trillion Shiba Inu In Just 20 Days

Robinhood Onboards 14 Trillion Shiba Inu In Just 20 Days

2 years ago
Six months into the conflict vast humanitarian needs remain unmet

Dozens rescued from dangers of rising flood waters in conflict areas

2 years ago
Seminar on the Objective Situation

Seminar on the Objective Situation

4 months ago
How a Billion-Dollar Business Switched Leaders for Success

How a Billion-Dollar Business Switched Leaders for Success

11 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.