Wednesday, July 16, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Meta introduces Chameleon, an early-fusion multimodal model

Simon Osuji by Simon Osuji
May 22, 2024
in Artificial Intelligence
0
Meta introduces Chameleon, an early-fusion multimodal model
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Meta introduces Chameleon, an early-fusion multimodal model
Chameleon represents all modalities—images, text, and code, as discrete tokens and uses a uniform transformer-based architecture that is trained from scratch in an end-to-end fashion on ∼10T tokens of interleaved mixed-modal data. As a result, Chameleon can both reason over, as well as generate, arbitrary mixed-modal documents. Text tokens are represented in green and image tokens are represented in blue. Credit: arXiv (2024). DOI: 10.48550/arxiv.2405.09818

AI researchers at Meta, the company that owns Facebook, Instagram, WhatsApp, and many other products, have designed and built a multimodal model to compete with the likes of Google’s Gemini.

Related posts

Another High-Profile OpenAI Researcher Departs for Meta

Another High-Profile OpenAI Researcher Departs for Meta

July 16, 2025
The Enshittification of American Power

The Enshittification of American Power

July 16, 2025

Called Chameleon, the new system is built on an early fusion architecture, and because of that it is able to comingle multiple inputs in ways not possible with most other systems.

The group, called the Chameleon Team, has written a paper describing their new model, including its architecture and how well it has performed during testing. It is posted on the arXiv preprint server.

AI multimodal models, as their name implies, are applications that are able to accept more than one type of input during a query—a user can submit a picture of a horse, for example, while also asking how many of its breed have won the Kentucky Derby.

To date, most such models have processed such data as separate entities in the early part of processing and then later brought them together to look for associations—a technique called late fusion.

Such an approach has been found to work well, but has limitations regarding integration. To overcome this, the team at Meta has based their model on early-fusion architecture.

This architecture allowed the team to interweave associations from the get-go. They accomplished this by converting images to tokens similar to the way LLMs parse words. The team also added the ability to use a unified vocabulary of tokens from different sources, including images, code or text—and they claim this allowed for applying transformative computing with mixed types of input data.

The researchers note that unlike Gemini, Chameleon is an end-to-end model, which made the need for image decoders unnecessary. They also developed and used new types of training techniques to allow their model to work with multiple types of tokens—ones that involved two-stage learning and a massive dataset of approximately 4.4 trillion texts, images, or pairs of tokens along with interleaved data. The system was trained using 7 billion and then 34 billion parameters over 5 million hours on a high-speed GPU.

The result, the research team claims, is a model that can accept text only, images only, or a combination of both and return intelligent answers and associations with better accuracy than its competitors.

More information:
Chameleon: Mixed-Modal Early-Fusion Foundation Models, arXiv (2024). DOI: 10.48550/arxiv.2405.09818

Journal information:
arXiv

© 2024 Science X Network

Citation:
Meta introduces Chameleon, an early-fusion multimodal model (2024, May 22)
retrieved 22 May 2024
from https://techxplore.com/news/2024-05-meta-chameleon-early-fusion-multimodal.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

ECI records 21-fold growth in gross exposure by end-2023 to reach $2.6bln in 5 years

Next Post

America’s 4th Largest Bank Warns of Economic Crash

Next Post
America’s 4th Largest Bank Warns of Economic Crash

America's 4th Largest Bank Warns of Economic Crash

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Kenya faces $1.6 billion funding gap amidst tax protest

Kenya faces $1.6 billion funding gap amidst tax protest

1 year ago
Drax and Power Minerals partner on green cement material

Drax and Power Minerals partner on green cement material

4 months ago
Game, Sett, funding: A startup building AI agents for game development emerges from stealth with $27M

Game, Sett, funding: A startup building AI agents for game development emerges from stealth with $27M

2 months ago
License Plate Readers Are Leaking Real-Time Video Feeds and Vehicle Data

License Plate Readers Are Leaking Real-Time Video Feeds and Vehicle Data

6 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.