
As recent artificial intelligence (AI) models’ capacity to understand and process long, complex sentences grows, the necessity for new semiconductor technologies that can simultaneously boost computation speed and memory efficiency is increasing.
Amidst this, a joint research team featuring KAIST researchers and international collaborators has successfully developed a core AI semiconductor “brain” technology based on a hybrid transformer and Mamba structure, which was implemented for the first time in the world in a form capable of direct computation inside the memory, resulting in a four-fold increase in the inference speed of large language models (LLMs) and a 2.2-fold reduction in power consumption.
A research team led by Professor Jongse Park from KAIST School of Computing, in collaboration with Georgia Institute of Technology in the United States and Uppsala University in Sweden, developed PIMBA, a core technology based on AI memory semiconductor (PIM, processing-in-memory), which acts as the brain for next-generation AI models.
The research is to be presented at the 58th International Symposium on Microarchitecture (MICRO 2025) and is currently available on the arXiv preprint server.
Currently, LLMs such as ChatGPT, GPT-4, Claude, Gemini, and Llama operate based on the transformer brain structure, which sees all of the words simultaneously. Consequently, as the AI model grows and the processed sentences become longer, the computational load and memory requirements surge, leading to speed reductions and high energy consumption as major issues.
To overcome these problems with transformer, the recently proposed sequential memory-based Mamba structure introduced a method for processing information over time, increasing efficiency. However, memory bottlenecks and power consumption limits still remained.
Professor Park Jongse’s research team designed PIMBA, a new semiconductor structure that directly performs computations inside the memory in order to maximize the performance of the transformer–Mamba hybrid model, which combines the advantages of both transformer and Mamba.
While existing GPU-based systems move data out of the memory to perform computations, PIMBA performs calculations directly within the storage device without moving the data. This minimizes data movement time and significantly reduces power consumption.
As a result, PIMBA showed up to a 4.1-fold improvement in processing performance and an average 2.2-fold decrease in energy consumption compared to existing GPU systems.
More information:
Wonung Kim et al, Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving, DOI: 10.1145/3725843.3756121 On arXiv. DOI: 10.48550/arxiv.2507.10178
arXiv
The Korea Advanced Institute of Science and Technology (KAIST)
Citation:
Semiconductor ‘brain’ combines transformer’s intelligence and Mamba’s efficiency (2025, October 17)
retrieved 17 October 2025
from https://techxplore.com/news/2025-10-semiconductor-brain-combines-intelligence-mamba.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.







