• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

AI models learn to split up tasks, slashing wait times for complex prompts

Simon Osuji by Simon Osuji
July 21, 2025
in Artificial Intelligence
0
AI models learn to split up tasks, slashing wait times for complex prompts
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


graph chart strings
Credit: Pixabay/CC0 Public Domain

As large language models (LLMs) like ChatGPT continue to advance, user expectations of them keep growing, including with respect to how quickly they can respond to our increasingly intricate prompts requesting answers to ever-challenging problems and tasks.

Related posts

HHS Is Using AI Tools From Palantir to Target ‘DEI’ and ‘Gender Ideology’ in Grants

HHS Is Using AI Tools From Palantir to Target ‘DEI’ and ‘Gender Ideology’ in Grants

February 2, 2026
Dyson Deals: WIRED’s Top Pick Pet Vacuum and Purifier Heater

Dyson Deals: WIRED’s Top Pick Pet Vacuum and Purifier Heater

February 2, 2026

Conventional LLMs rely on the concept of “autoregressive decoding,” where each item (“token”) in a sequence is predicted based on previously generated outputs. This approach inevitably leads to delays for more complicated prompts, though researchers have tried to mitigate this with projects that leverage the parallelism of multicore computer chips more effectively. For example, speculative decoding uses a fast draft model to propose tokens that are then verified in parallel by a slower, high-quality model.

A newer class of methods instead exploits “semantic independence,” identifying syntactic patterns like bullet points and expanding each in parallel. But they rely on hand-crafted syntactic heuristics, which are brittle and often fail when responses deviate from expected formats.

These shortcomings inspired researchers at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and Google to use a learning-based approach to parallel decoding. Instead of relying on fixed rules, their method trains LLMs to recognize semantic independence—that is, to identify and decode semantically independent chunks of text in parallel.

The result: pasta.

Specifically, the CSAIL team’s Parallel Structure Annotation (PASTA) enables LLMs to generate text in parallel, dramatically accelerating their response times. Unlike previous attempts that relied on rigid, hand-coded rules to identify independent text segments, PASTA teaches LLMs to inherently understand and express these parallelization opportunities within their own responses.

This approach—called learned asynchronous decoding—marks a shift toward teaching models to orchestrate their own parallel decoding strategy. The findings are published on the arXiv preprint server.

“Traditional LLMs are like a single cook making lasagna, one step at a time,” explained Tian Jin, lead author of a new paper on the project that was presented at the International Conference on Machine Learning (ICML 2025) in Vancouver. “PASTA teaches the cook to recognize when different parts of the lasagna can be prepared simultaneously, like mixing a subset of ingredients while the oven preheats, leading to a much faster process overall.”

This innovation tackles a fundamental bottleneck in LLM inference, where the sequential nature of decoding often results in underutilized hardware and lengthy wait times for users. Current LLMs can take seconds or even minutes to fulfill user requests, a latency issue that PASTA aims to resolve.

At the heart of PASTA are two main components: PASTA-LANG, an annotation language that allows LLMs to tag semantically independent parts of their responses, and an interpreter that acts on these tags to orchestrate parallel decoding during inference. As Jin explains, you can think of PASTA-LANG as a set of instructions the LLM writes for itself, marking sections of its output that can be worked on simultaneously. The interpreter then reads these instructions and manages the parallel generation of those sections.

The team trained LLMs to generate these PASTA-LANG annotations through a two-stage fine-tuning process. This training not only optimizes for decoding speed but also approximately maintains or even improves the quality of the generated responses. This dual optimization is a significant leap forward, as it enables continuous improvements on both speed and quality as more training compute becomes available.

In experiments conducted with PASTA on the AlpacaEval benchmark used, the team’s self-parallelizing model showed geometric mean speedups reaching nearly 2x while experiencing only minor changes in response quality (from a gain of 2% to a drop of 7%). This means users can expect responses nearly twice as fast without a noticeable decrease in accuracy or coherence.

“It was surprising to see this behavior of having an LLM orchestrate its own inference-time behavior,” Jin says. “It was illuminating—and in a way, magical—to see how throwing more compute at these algorithms yields increasingly sophisticated self-orchestration behavior.”

The research highlights a critical challenge in the field: balancing speed and quality. Prior methods such as Skeleton-of-Thought (SoT) and APAR attempted parallel decoding by looking for manually specified syntactic structures like bullet points or paragraphs. However, these methods were often rigid and imprecise, failing to identify parallelization opportunities when responses deviated even slightly from expected patterns. PASTA’s learning-based approach, in contrast, offers a more robust and scalable solution.

“It’s about empowering the LLM to be smarter about how it generates content,” says Jin, a Ph.D. student at CSAIL. “Instead of us trying to guess where it can work in parallel, we’re teaching the LLM to identify those opportunities itself, on the fly.”

Looking ahead, the team is optimistic about the broader implications of PASTA. The ability to significantly reduce LLM decoding latency could lead to reduced computational resource requirements, making these powerful AI models more accessible and affordable to a wider range of users and applications.

“We’ve essentially designed a protocol for an LLM to optimize itself,” says Jin. “By improving the efficacy of LLM inference, PASTA could significantly reduce computational resource requests and improve accessibility of LLMs.”

Jin spearheaded the project alongside his two faculty advisers, MIT professors Michael Carbin and Jonathan Ragan-Kelley. Other paper co-authors include CSAIL’s Ellie Y. Cheng and Zack Ankner, and Google researchers Suvinay Subramanian, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh.

More information:
Tian Jin et al, Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding, arXiv (2025). DOI: 10.48550/arxiv.2502.11517

Journal information:
arXiv

Provided by
Massachusetts Institute of Technology

Citation:
AI models learn to split up tasks, slashing wait times for complex prompts (2025, July 21)
retrieved 21 July 2025
from https://techxplore.com/news/2025-07-ai-tasks-slashing-complex-prompts.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Chrome for iOS makes it easier to switch between work and personal Google accounts

Next Post

The Impact of Mass ICE Raids on Migrant Farmworkers in America

Next Post
The Impact of Mass ICE Raids on Migrant Farmworkers in America

The Impact of Mass ICE Raids on Migrant Farmworkers in America

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Thunes and Ecobank Group to power Africa’s instant payments for the next billion users.

Thunes and Ecobank Group to power Africa’s instant payments for the next billion users.

3 months ago
3 AI pitfalls to avoid on the road to business growth – IT News Africa

3 AI pitfalls to avoid on the road to business growth – IT News Africa

2 years ago
California Institute of the Arts faculty and staff move to unionise

California Institute of the Arts faculty and staff move to unionise

1 year ago
Analyst’s Optimistic Forecast at 90% Before 2024

Analyst’s Optimistic Forecast at 90% Before 2024

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.