• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

ChatGPT’s New Upgrade Teases AI’s Multimodal Future

Simon Osuji by Simon Osuji
October 2, 2023
in Artificial Intelligence
0
ChatGPT’s New Upgrade Teases AI’s Multimodal Future
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter



ChatGPT isn’t just a chatbot anymore.

OpenAI’s latest upgrade grants ChatGPT powerful new abilities that go beyond text. It can tell bedtime stories in its own AI voice, identify objects in photos, and respond to audio recordings. These capabilities represent the next big thing in AI: multimodal models.

“Multimodal is the next generation of these large models, where it can process not just text, but also images, audio, video, and even other modalities,” says Dr. Linxi “Jim” Fan, Senior AI Research Scientist at Nvidia.

ChatGPT gets an eyes-and-ears power-up

ChatGPT’s upgrade is a noteworthy example of a multimodal AI system. Instead of using a single AI model designed to work with a single form of input, like a large language model (LLM) or speech-to-voice model, multiple models work together to create a more cohesive AI tool.

“The future of generative AI is hyper personalization. This will happen for knowledge workers, creatives, and end users.”
—Kyle Shannon, Storyvine

OpenAI provides three specific multimodal features. Users can prompt the chatbot with images or voice, as well as receive responses in one of five Ai-generated voices. Image input is available on all platforms, while voice is limited to the ChatGPT app for Android and iOS.

A demo from OpenAI shows ChatGPT being used to adjust a bike seat. A befuddled cyclist first snaps a photo of their bike and asks for help lowering the seat, then follows up with photos of the bike’s user manual and a toolset. ChatGPT responds with text describing the best tool for the job and how to use it.

These multimodal features aren’t entirely new. GPT-4 launched with an understanding of image prompts in March of 2023, which was put into practice by some OpenAI partners—including Microsoft’s Bing Chat. But tapping these features required API access, so it was generally reserved to partners and developers.

GPT4’s multimodal features appeared in Bing Chat in the summer of 2023. Microsoft

They’re now available to everyone willing to pay $20 a month for a ChatGPT Plus subscription. And their synthesis with ChatGPT’s friendly interface is another perk. Image input is as simple as opening the app and tapping an icon to snap a photo.

Simplicity is multimodal AI’s killer feature. Current AI models for images, videos, and voice are impressive, but finding the right model for each task can be time-consuming, and moving data between models is a chore. Multimodal AI eliminates these problems. A user can prompt the AI agent with various media, then seamlessly switch between images, text, and voice prompts within the same conversation.

“This points to the future of these tools, where they can provide us almost anything we want in the moment,” says Kyle Shannon, Founder & CEO of AI video platform Storyvine. “The future of generative AI is hyper personalization. This will happen for knowledge workers, creatives, and end users.”

Is multimodal the future?

ChatGPT’s support for image and voice is just a taste of what’s to come.

“While there aren’t any good models for it right now, in principle you can give it 3D data, or even something like digital smell data, and it can output images, videos, and even actions,” says Dr. Fan. “I do research at Nvidia on game AI, and robotics, and multimodal models are critical for these efforts.”

Image and voice input is the natural start for ChatGPT’s multimodal capabilities. It’s a user-facing app, and these are two of the most common forms of data a user might want to use. But there’s no reason an AI model can’t train to address other forms of data, whether it’s an Excel spreadsheet, a 3D model, or a photograph with depth data.

That’s not to say it’s easy. Organizations looking to build multimodal AI face many challenges. The biggest, perhaps, is wrangling the vast sums of data required to train a roster of AI models.

“I think multimodal models will have roughly the same landscape as the current large language models,” says Fan. “It’s very capital intense. And it’s probably even worse for multimodal, because consider how much data is in the images, and in the videos.”

That would seem to give the edge to ChatGPT and other well-heeled AI startups, such as Anthropic, creator of Claude.ai, which recently entered an agreement worth “up to 4 billion” with Amazon.

But it’s too soon to count out smaller organizations. Fan says research into multimodal AI is less mature than research into LLMs, leaving room for researchers to find new techniques. Shannon agrees and expects innovation from all sides, citing the rapid iteration and improvement of “open-source” large language models like Meta’s LLaMA 2.

“I think there will always be a pendulum between general [AI] tools and specialty tools,” says Shannon. “What changes is that now we have the possibility of truly general tools. The specialization can be a choice rather than a requirement.”

From Your Site Articles

Related Articles Around the Web





Source link

Related posts

Keychron Q16 HE 8K Review: A Ceramic Disappointment

Keychron Q16 HE 8K Review: A Ceramic Disappointment

January 30, 2026
After Minneapolis, Tech CEOs Are Struggling to Stay Silent

After Minneapolis, Tech CEOs Are Struggling to Stay Silent

January 30, 2026
Previous Post

Your Project Management Software Can’t Save You

Next Post

AI Detection Startups Say Amazon Could Flag AI Books. It Doesn’t

Next Post
AI Detection Startups Say Amazon Could Flag AI Books. It Doesn’t

AI Detection Startups Say Amazon Could Flag AI Books. It Doesn't

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Capital shifts to stablecoins as DeFi protocols bleed TVL

Capital shifts to stablecoins as DeFi protocols bleed TVL

6 months ago
Mombasa Cement owner Hasmukh Patel dies at 58

Mombasa Cement owner Hasmukh Patel dies at 58

1 year ago
African Development Bank approves $196 million loan to modernize railway infrastructure

African Development Bank approves $196 million loan to modernize railway infrastructure

2 years ago
ADNOC Distribution launches ARIF, AI-powered investor relations chatbot

ADNOC Distribution launches ARIF, AI-powered investor relations chatbot

1 year ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.