• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

Scalable transformer accelerator enables on-device execution of large language models

Simon Osuji by Simon Osuji
July 22, 2025
in Artificial Intelligence
0
Scalable transformer accelerator enables on-device execution of large language models
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Scalable transformer accelerator enables on-device execution of large language models
Differences in processes with and without hardware accelerators. Credit: Electronics (2024). DOI: 10.3390/electronics13234683

Large language models (LLMs) like BERT and GPT are driving major advances in artificial intelligence, but their size and complexity typically require powerful servers and cloud infrastructure. Running these models directly on devices—without relying on external computation—has remained a difficult technical challenge.

Related posts

The ICE Expansion Won’t Happen in the Dark

The ICE Expansion Won’t Happen in the Dark

February 11, 2026
Jeffrey Epstein Advised an Elon Musk Associate on Taking Tesla Private

Jeffrey Epstein Advised an Elon Musk Associate on Taking Tesla Private

February 11, 2026

A research team at Sejong University has developed a new hardware solution that may help change that. The work is published in the journal Electronics.

Their Scalable Transformer Accelerator Unit (STAU) is designed to execute various transformer-based language models efficiently on embedded systems. It adapts dynamically to different input sizes and model structures, making it especially well-suited for real-time on-device AI.

At the heart of the STAU is a Variable Systolic Array (VSA) architecture, which performs matrix operations—the core workload in transformer models—in a way that scales with the input sequence length. By feeding input data row by row and loading weights in parallel, the system reduces memory stalls and improves throughput. This is particularly important for LLMs, where sentence lengths and token sequences vary widely between tasks.

In benchmark tests published in Electronics, the accelerator demonstrated a 3.45× speedup over CPU-only execution while maintaining over 97% numerical accuracy. It also reduced total computation time by more than 68% when processing longer sequences.

Since then, continued optimizations have further improved the system’s performance: according to the team, recent internal tests achieved a speedup of up to 5.18×, highlighting the architecture’s long-term scalability.

  • Scalable transformer accelerator enables on-device execution of large language models
    Top module architecture. Credit: Electronics (2024). DOI: 10.3390/electronics13234683
  • Scalable transformer accelerator enables on-device execution of large language models
    Processing Element (PE) and Variable Systolic Array (VSA) architecture. Credit: Electronics (2024). DOI: 10.3390/electronics13234683

The researchers also re-engineered a critical part of the transformer pipeline: the softmax function. Typically a bottleneck due to its reliance on exponentiation and normalization, it was redesigned using a lightweight Radix-2 approach that relies on shift-and-add operations. This reduces the hardware complexity without compromising output quality.

To further simplify computation, the system uses a custom 16-bit floating-point format specifically tailored for transformer workloads. This format eliminates the need for layer normalization—another common performance bottleneck—and contributes to a more efficient, streamlined datapath.

STAU was implemented on a Xilinx FPGA (VMK180) and controlled by an embedded Arm Cortex-R5 processor. This hybrid design allows developers to support a range of transformer models—including those used in LLMs—by simply updating software running on the processor, with no hardware modifications required.

The team sees their work as a step toward making advanced language models more accessible and deployable across a broader range of platforms—including mobile devices, wearables, and edge computing systems—where real-time AI execution, privacy, and low-latency response are essential.

“The STAU architecture shows that transformer models, even large ones, can be made practical for on-device applications,” said lead author Seok-Woo Chang. “It provides a foundation for building intelligent systems that are both scalable and efficient.”

More information:
Seok-Woo Chang et al, Scalable Transformer Accelerator with Variable Systolic Array for Multiple Models in Voice Assistant Applications, Electronics (2024). DOI: 10.3390/electronics13234683

Provided by
Sejong University

Citation:
Scalable transformer accelerator enables on-device execution of large language models (2025, July 21)
retrieved 21 July 2025
from https://techxplore.com/news/2025-07-scalable-enables-device-large-language.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

The Growing Role Of Hybrid Backup Solutions In Securing Africa’s Digital Future

Next Post

Could modular manufacturing solve US weapons stockpile woes?

Next Post
Could modular manufacturing solve US weapons stockpile woes?

Could modular manufacturing solve US weapons stockpile woes?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Can the Stock Do Better?

Can the Stock Do Better?

1 year ago
Cheaper zonal pricing risks ‘derailing’ Scottish offshore wind

Cheaper zonal pricing risks ‘derailing’ Scottish offshore wind

11 months ago
Egypt: President El-Sisi Speaks with King of Bahrain His Majesty King Hamad Bin Isa

Egypt: President El-Sisi Speaks with King of Bahrain His Majesty King Hamad Bin Isa

1 year ago
EDGE to Open Military Testing, Training Island in Middle East

EDGE to Open Military Testing, Training Island in Middle East

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.