Sunday, June 8, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Self-trained vision transformers mimic human gaze with surprising precision

Simon Osuji by Simon Osuji
May 26, 2025
in Artificial Intelligence
0
Self-trained vision transformers mimic human gaze with surprising precision
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter


Self-trained vision transformers mimic human gaze with surprising precision
Comparison of gaze coordinates between human participants and attention heads of vision transformers (ViTs). Credit: Neural Networks (2025). DOI: 10.1016/j.neunet.2025.107595

Can machines ever see the world as we see it? Researchers have uncovered compelling evidence that vision transformers (ViTs), a type of deep-learning model that specializes in image analysis, can spontaneously develop human-like visual attention patterns when trained without labeled instructions.

Related posts

Bill Atkinson, Macintosh Pioneer and Inventor of Hypercard, Dies at 74

Bill Atkinson, Macintosh Pioneer and Inventor of Hypercard, Dies at 74

June 7, 2025
How to Prepare for a Climate Disaster in Trump’s America

How to Prepare for a Climate Disaster in Trump’s America

June 7, 2025

Visual attention is the mechanism by which organisms, or artificial intelligence (AI), filter out “visual noise” to focus on the most relevant parts of an image or view. While natural for humans, spontaneous learning has proven difficult for AI.

However, researchers have revealed, in their recent publication in Neural Networks, that with the right training experience, AI can spontaneously acquire human-like visual attention without being explicitly taught to do so.

The research team, from the University of Osaka, compared human eye-tracking data to attention patterns generated by ViTs trained using DINO (“self-distillation with no labels”), a method of self-supervised learning that allows models to organize visual information without annotated datasets.

Remarkably, the DINO-trained ViTs exhibited gaze behavior that closely mirrored that of typically developing adults when viewing dynamic video clips. In contrast, ViTs trained with conventional supervised learning showed unnatural visual attention.

“Our models didn’t just attend to visual scenes randomly, they spontaneously developed specialized functions,” says Takuto Yamamoto, lead author of the study. “One subset of the model consistently focused on faces, another captured the outlines of entire figures, and a third attended primarily to background features. This closely reflects how human visual systems segment and interpret scenes.”







Comparison of human gaze and DINO ViTs. The movie shows the gaze locations of human participants (adults with typical development, TD adults; n = 27) and DINO ViTs (24 G1 heads from 8- and 12-layer ViTs). Note the remarkable similarity between the red dots (TD adults) and cyan squared (DINO ViTs). Credit: Neural Networks (2025). DOI: 10.1016/j.neunet.2025.107595

Through detailed analyses, the team demonstrated that these attention clusters emerged naturally in the DINO-trained ViTs. These attention patterns were not only qualitatively similar to the human gaze, but also quantitatively aligned with established eye-tracking data, particularly in scenes involving human figures. The findings suggest a possible extension of the traditional, two-part figure–ground model of perception in psychology into a three-part model.

“What makes this result remarkable is that these models were never told what a face is,” explains senior author, Shigeru Kitazawa, “Yet they learned to prioritize faces, probably because doing so maximized the information gained from their environment. It is a compelling demonstration that self-supervised learning may capture something fundamental about how intelligent systems, including humans, learn from the world.”

The study underscores the potential of self-supervised learning not only for advancing AI applications, but also for modeling aspects of biological vision. By aligning artificial systems more closely with human perception, self-supervised ViTs offer a new lens for interpreting both machine learning and human cognition.

The findings of this study could be used for a variety of applications, such as the development of human-friendly robots or to enhance support during early childhood development.

More information:
Takuto Yamamoto et al, Emergence of human-like attention and distinct head clusters in self-supervised vision transformers: A comparative eye-tracking study, Neural Networks (2025). DOI: 10.1016/j.neunet.2025.107595

Provided by
University of Osaka

Citation:
Self-trained vision transformers mimic human gaze with surprising precision (2025, May 26)
retrieved 26 May 2025
from https://techxplore.com/news/2025-05-vision-mimic-human-precision.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Lobito Corridor initiative launched with $100M to drive regional growth in Africa

Next Post

Havelsan showcase BAHA and BOZBEY drones at 4th African Air Forces Forum

Next Post
Havelsan showcase BAHA and BOZBEY drones at 4th African Air Forces Forum

Havelsan showcase BAHA and BOZBEY drones at 4th African Air Forces Forum

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

The D Brief: Europe’s wartime crossroads; F-35’s training gap; Army’s to-do list; Space Force seeks boot camp; And a bit more.

The D Brief: Europe’s wartime crossroads; F-35’s training gap; Army’s to-do list; Space Force seeks boot camp; And a bit more.

6 months ago
Revolutionizing Telecom: Innovative Designs for Next-Gen Networks

Revolutionizing Telecom: Innovative Designs for Next-Gen Networks

1 month ago
BENLabs Is Using AI To Help Smaller Creators Scale

BENLabs Is Using AI To Help Smaller Creators Scale

2 years ago
How High Doge Will Trade In May?

How High Doge Will Trade In May?

1 year ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.