Thursday, June 5, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Psychology-based tasks assess multi-modal LLM visual cognition limits

Simon Osuji by Simon Osuji
February 7, 2025
in Artificial Intelligence
0
Psychology-based tasks assess multi-modal LLM visual cognition limits
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Study assesses the visual cognition of multimodal LLMs using psychology-based tasks
The help or hinder task; one of the tasks used to test the visual cognition of multimodal LLMs. Credit: MIT.

Over the past decades, computer scientists have created increasingly advanced artificial intelligence (AI) models, some of which can perform similarly to humans on specific tasks. The extent to which these models truly “think” and analyze information like humans, however, is still a heated topic of discussion.

Related posts

Why More Young People Are Becoming ‘Relationship Anarchists’

Why More Young People Are Becoming ‘Relationship Anarchists’

June 5, 2025
Reddit sues AI giant Anthropic over content use

Reddit sues AI giant Anthropic over content use

June 5, 2025

Researchers at the Max Planck Institute for Biological Cybernetics, the Institute for Human-Centered AI at Helmholtz Munich and the University of Tubingen recently set out to better understand the extent to which multi-modal large language models (LLMs), a promising class of AI models, grasp complex interactions and relationships in visual cognition tasks.

Their findings, published in Nature Machine Intelligence, show that while some LLMs perform well in tasks that entail processing and interpreting data, they often fail to derive intricacies that humans would grasp.

“We were inspired by an influential paper by Brenden M. Lake and others, which outlined key cognitive components required for machine learning models to be considered human-like,” Luca M. Schulze Buschoff and Elif Akata, co-authors of the paper, told Tech Xplore.

“When we began our project, there was promising progress in vision language models that can process both language and images. However, many questions remained about whether these models could perform human-like visual reasoning.”

The main objective of the recent study by Buschoff, Akata and their colleagues was to assess the ability of multi-modal LLMs to grasp specific aspects of visual processing tasks, such as intuitive physics, casual relationships and the intuitive understanding of people’s preferences. This could in turn help to shed light on the extent to which the capabilities of these models could actually be considered human-like.

To determine this, the researchers carried out a series of controlled experiments, where they tested the models on tasks derived from past psychology studies. This approach to the testing of AI was first pioneered in an earlier paper by Marcel Binz and Eric Schulz, published in PNAS.

“For example, to test their understanding of intuitive physics, we gave the models images of block towers and asked them to judge whether a given tower is stable or not,” explained Buschoff and Akata.

“For causal reasoning and intuitive psychology, the models needed to infer relationships between events or understand the preferences of other agents. We then evaluated their basic performance and compared them to human participants that took part in the same experiments.”

By comparing the responses of LLMs during tasks with those given by human participants, the researchers were able to better understand the ways in which the models were aligned with humans and where they fell short.

Overall, their findings showed that although some models were good at processing basic visual data, they still struggled to emulate more intricate aspects of human cognition.

“At this point it is not clear whether this is something that can be solved by scale and more diversity in the training data,” said Buschoff and Akata.

“This feeds into a larger debate on the kinds of inductive biases these models need to be outfitted with. For instance, some argue that these models need to be equipped with some basic processing modules such as a physics engine, so that they achieve a general and robust understanding of the physical world. This even goes back to findings in children showing that they can predict some physical processes from an early age.”

The recent work by Buschoff, Akata and their colleagues offers new valuable insight into the extent to which current state-of-the-art multi-modal LLMs exhibit human-like cognitive skills. So far, the team have tested models that were pre-trained on large datasets, but they would soon like to conduct additional tests on models that were fine-tuned on the same types of tasks used in the experiments.

“Our early results with fine-tuning show that they do become a lot better at the specific task they are trained on,” added Buschoff and Akata.

“However, these improvements don’t always translate to a broader, more generalized understanding across different tasks, which is something that humans do remarkably well.”

More information:
Luca M. Schulze Buschoff et al, Visual cognition in multimodal large language models, Nature Machine Intelligence (2025). DOI: 10.1038/s42256-024-00963-y.

© 2025 Science X Network

Citation:
Psychology-based tasks assess multi-modal LLM visual cognition limits (2025, February 6)
retrieved 7 February 2025
from https://techxplore.com/news/2025-02-psychology-based-tasks-multi-modal.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Police arrested a quarter of a million suspects over festive season

Next Post

Feds Halt the National Electric Vehicle Charging Program

Next Post
Feds Halt the National Electric Vehicle Charging Program

Feds Halt the National Electric Vehicle Charging Program

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Workers seek training, guidance on integrating AI in their work, shows survey

Workers seek training, guidance on integrating AI in their work, shows survey

8 months ago
What to Expect by May 1st, 2025

What to Expect by May 1st, 2025

1 month ago
How CFOs can reduce SaaS spend by 30% in these tough times

How CFOs can reduce SaaS spend by 30% in these tough times

2 years ago
High inflation vs regional conflict: Which leads to higher adoption of Bitcoin and crypto?

High inflation vs regional conflict: Which leads to higher adoption of Bitcoin and crypto?

8 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.