• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

AI generates data to help embodied agents ground language to 3D world

Simon Osuji by Simon Osuji
June 16, 2025
in Artificial Intelligence
0
AI generates data to help embodied agents ground language to 3D world
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter


AI generates data to help embodied agents ground language to 3D world
A new 3D-text dataset, 3D-GRAND, leverages generative AI to create synthetic rooms that are automatically annotated with 3D structures. The dataset’s 40,087 household scenes can help train embodied AI, like household robots, connect language to 3D spaces. Credit: Joyce Chai

A new, densely annotated 3D-text dataset called 3D-GRAND can help train embodied AI, like household robots, to connect language to 3D spaces. The study, led by University of Michigan researchers, was presented at the Computer Vision and Pattern Recognition (CVPR) Conference in Nashville, Tennessee on June 15, and published on the arXiv preprint server.

Related posts

ICE and Qatari Security Forces at the Winter Olympics Put Italians on Edge

ICE and Qatari Security Forces at the Winter Olympics Put Italians on Edge

February 2, 2026
Don’t Regulate AI Models. Regulate AI Use

Don’t Regulate AI Models. Regulate AI Use

February 2, 2026

When put to the test against previous 3D datasets, the model trained on 3D-GRAND reached 38% grounding accuracy, surpassing the previous best model by 7.7%. 3D-GRAND also drastically reduced hallucinations to only 6.67% from the previous state-of-the-art rate of 48%.

The dataset contributes to the next generation of household robots that will far exceed the robotic vacuums that currently populate homes. Before we can command a robot to “pick up the book next to the lamp on the nightstand and bring it to me,” the robot must be trained to understand what language refers to in space.

“Large multimodal language models are mostly trained on text with 2D images, but we live in a 3D world. If we want a robot to interact with us, it must understand spatial terms and perspectives, interpret object orientations in space, and ground language in the rich 3D environment,” said Joyce Chai, a professor of computer science and engineering at U-M and senior author of the study.

While text or image-based AI models can pull an enormous amount of information from the internet, 3D data is scarce. It’s even harder to find 3D data with grounded text data—meaning specific words like “sofa” are linked to 3D coordinates bounding the actual sofa.

Like all LLMs, 3D-LLMs perform best when trained on large data sets. However, building a large dataset by imaging rooms with cameras would be time-intensive and expensive as annotators must manually specify objects and their spatial relationships and link words to their corresponding objects.

The research team took a new approach, leveraging generative AI to create synthetic rooms that are automatically annotated with 3D structures. The resulting 3D-GRAND dataset includes 40,087 household scenes paired with 6.2 million densely-grounded descriptions of the room.

“A big advantage of synthetic data is that labels come for free because you already know where the sofa is, which makes the curation process easier,” said Jianing Jed Yang, a doctoral student of computer science and engineering at U-M and lead author of the study.

After generating the synthetic 3D data, an AI pipeline first used vision models to describe each object’s color, shape and material. From here, a text-only model generated descriptions of entire scenes while using scene graphs—structured maps of how objects relate to each other—to ensure each noun phrase is grounded to specific 3D objects.

A final quality control step used a hallucination filter to ensure each object generated in the text actually has an associated object in the 3D scene.

Human evaluators spot-checked 10,200 room-annotation pairs to ensure reliability by assessing whether there were any inaccuracies in AI-generated sentences or objects. The synthetic annotations had a low error rate of about 5% to 8%, which is comparable to professional human annotations.

“Given the size of the dataset, the LLM-based annotation reduces both the cost and time by an order of magnitude compared to human annotation, creating 6.2 million annotations in just two days. It is widely recognized that collecting high-quality data at scale is essential for building effective AI models,” said Yang.

To put the new dataset to the test, the research team trained a model on 3D-GRAND and compared it with three baseline models (3D-LLM, LEO and 3D-VISTA). The benchmark ScanRefer evaluated grounding accuracy—how much overlap the predicted bounding box overlaps with the true object boundary—while a newly introduced benchmark called 3D-POPE evaluated object hallucinations.

The model trained on 3D-GRAND reached a 38% grounding accuracy with only a 6.67% hallucination rate, far exceeding the competing generative models. While 3D-GRAND contributes to the 3D-LLM modeling community, testing on robots will be the next step.

“It will be exciting to see how 3D-GRAND helps robots better understand space and take on different spatial perspectives, potentially improving how they communicate and collaborate with humans,” said Chai.

More information:
Jianing Yang et al, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination, arXiv (2024). DOI: 10.48550/arxiv.2406.05132

Journal information:
arXiv

Provided by
University of Michigan College of Engineering

Citation:
AI generates data to help embodied agents ground language to 3D world (2025, June 16)
retrieved 16 June 2025
from https://techxplore.com/news/2025-06-ai-generates-embodied-agents-ground.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Insecurity: Benue LG boss bans forest logging activities – EnviroNews

Next Post

Why Ghana banned the use of honorary doctorates and professorship in public

Next Post
Why Ghana banned the use of honorary doctorates and professorship in public

Why Ghana banned the use of honorary doctorates and professorship in public

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Does Amazon Accept Progressive Leasing?

Does Amazon Accept Progressive Leasing?

2 years ago
Robot stand-in mimics your movements in VR

Robot stand-in mimics your movements in VR

2 years ago
Can the US really enforce a global AI chip ban?

Can the US really enforce a global AI chip ban?

9 months ago
Pakistan to supply 16 JF-17s and 12 Super Mushshak aircraft to Libyan faction

Pakistan to supply 16 JF-17s and 12 Super Mushshak aircraft to Libyan faction

4 weeks ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.