• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

AI-powered software narrates surroundings for visually impaired in real time

Simon Osuji by Simon Osuji
October 10, 2024
in Artificial Intelligence
0
AI-powered software narrates surroundings for visually impaired in real time
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter


Real-time descriptions of surroundings for people who are blind
As a user scans their phone camera around a room, WorldScribe will create brief audio descriptions of the objects recorded by the camera. Credit: Shen-Yun Lai, used with permission

A world of color and texture could soon become more accessible to people who are blind or have low vision, via new software that narrates what a camera records.

Related posts

Super Bowl Tailgate Photo Essay: Bad Bunny, Big Tech, and the Big Game

Super Bowl Tailgate Photo Essay: Bad Bunny, Big Tech, and the Big Game

February 9, 2026
Turning Point USA’s Halftime Show Was Exactly What You’d Expect

Turning Point USA’s Halftime Show Was Exactly What You’d Expect

February 9, 2026

The tool, called WorldScribe, was designed by University of Michigan researchers and will be presented at the 2024 ACM Symposium on User Interface Software and Technology in Pittsburgh.

The study is titled “WorldScribe: Towards Context-Aware Live Visual Descriptions” and appears on the arXiv preprint server.

The tool uses generative AI (GenAI) language models to interpret the camera images and produce text and audio descriptions in real time to help users become aware of their surroundings more quickly. It can adjust the level of detail based on the user’s commands or the length of time that an object is in the camera frame, and the volume automatically adapts to noisy environments like crowded rooms, busy streets and loud music.






Credit: Ruei-Che Chang

The tool will be demonstrated at 6:00 pm EST Oct, 14, and a study of the tool—which organizers have identified as one of the best at the conference—will be presented at 3:15 pm EST Oct. 16.

“For us blind people, this could really revolutionize the ways in which we work with the world in everyday life,” said Sam Rau, who was born blind and participated in the WorldScribe trial study.

“I don’t have any concept of sight, but when I tried the tool, I got a picture of the real world, and I got excited by all the color and texture that I wouldn’t have any access to otherwise,” Rau said.

“As a blind person, we’re sort of filling in the picture of what’s going on around us piece by piece, and it can take a lot of mental effort to create a bigger picture. But this tool can help us have the information right away, and in my opinion, helps us to just focus on being human rather than figuring out what’s going on. I don’t know if I can even impart in words what a huge miracle that truly is for us.”

Real-time descriptions of surroundings for people who are blind
When the user is moving slowly around the room, WorldScribe will use GPT-4 to create colorful descriptions of objects. When asked to help look for a laptop, the tool will prioritize detailed descriptions of any laptops in the room. Credit: Shen-Yun Lai, used with permission

During the trial study, Rau donned a headset equipped with a smartphone and walked around the research lab. The phone camera wirelessly transferred the images to a server, which almost instantly generated text and audio descriptions of objects in the camera frame: a laptop on a desk, a pile of papers, a TV and paintings mounted on the wall nearby.

The descriptions constantly changed to match whatever was in view of the camera, prioritizing objects that were closest to Rau. A brief glance at a desk produced a simple one-word description, but a longer inspection yielded information about the folders and papers arranged on top.

The tool can adjust the level of detail in its descriptions by switching between three different AI language models. The YOLO World model quickly generates very simple descriptions of objects that briefly appear in the camera frame. Detailed descriptions of objects that remain in the frame for a longer period of time are handled by GPT-4, the model behind ChatGPT. Another model, Moondream, provides an intermediate level of detail.

“Many of the existing assistive technologies that leverage AI focus on specific tasks or require some sort of turn-by-turn interaction. For example, you take a picture, then get some result,” said Anhong Guo, an assistant professor of computer science and engineering and a corresponding author of the study.

“Providing rich and detailed descriptions for a live experience is a grand challenge for accessibility tools,” Guo said. “We saw an opportunity to use the increasingly capable AI models to create automated and adaptive descriptions in real-time.”

Because it relies on GenAI, WorldScribe can also respond to user-provided tasks or queries, such as prioritizing descriptions of any objects that the user asked the tool to find. Some study participants noted that the tool had trouble detecting certain objects, such as an eyedropper bottle, however.

Rau says the tool is still a bit clunky for everyday use in its current state, but says he would use it everyday if it could be integrated into smart glasses or another wearable device.

The researchers have applied for patent protection with the assistance of U-M Innovation Partnerships and are seeking partners to help refine the technology and bring it to market.

Guo is also an assistant professor of information within U-M’s School of Information.

More information:
Ruei-Che Chang et al, WorldScribe: Towards Context-Aware Live Visual Descriptions, arXiv (2024). DOI: 10.1145/3654777.3676375

Journal information:
arXiv

Provided by
University of Michigan

Citation:
AI-powered software narrates surroundings for visually impaired in real time (2024, October 10)
retrieved 10 October 2024
from https://techxplore.com/news/2024-10-ai-powered-software-narrates-visually.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

King Charles’ alma mater aims to be one of UK’s greenest schools

Next Post

The Supreme Court May Force Oklahoma to Kill Richard Glossip

Next Post
The Supreme Court May Force Oklahoma to Kill Richard Glossip

The Supreme Court May Force Oklahoma to Kill Richard Glossip

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Solana’s BONK Surges on Speculations for a Possible Learn and Earn Campaign with Revolut

Solana’s BONK Surges on Speculations for a Possible Learn and Earn Campaign with Revolut

2 years ago
Renewable Britain won’t happen without local engagement

Renewable Britain won’t happen without local engagement

1 year ago
How Hacked Card Shufflers Allegedly Enabled a Mob-Fueled Poker Scam That Rocked the NBA

How Hacked Card Shufflers Allegedly Enabled a Mob-Fueled Poker Scam That Rocked the NBA

4 months ago
Science Museum in London criticised for promoting oil giant backer

Science Museum in London criticised for promoting oil giant backer

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.