Reinforcement learning boosts reasoning skills in new diffusion-based language model d1

d1 uses using reinforcement learning to enhance the reasoning capabilities of dLLMs — Log Probability Estimation in diffu-GRPO. Credit: *arXiv* (2025). DOI: 10.48550/arxiv.2504.12216

A team of AI researchers at the University of California, Los Angeles, working with a colleague from Meta AI, has introduced d1, a diffusion-large-language-model-based framework that has been improved through the use of reinforcement learning. The group posted a paper describing their work and features of the new framework on the arXiv preprint server.

Area Man Accidentally Hacks 6,700 Camera-Enabled Robot Vacuums

March 1, 2026

X Is Drowning in Disinformation Following US and Israel’s Attack on Iran

February 28, 2026

Over the past couple of years, the use of LLMs has skyrocketed, with millions of people the world over using AI apps for a wide variety of applications. This has led to an associated need for large amounts of electricity to power data centers running the computer-intensive applications. Researchers have been looking for other ways to provide AI services to the user community. One such approach involves the use of dLLMs as either a replacement or complementary approach.

Diffusion-based LLMs (dLLMs) are AI models that arrive at answers differently than LLMs. Instead of taking the autoregressive approach, they use diffusion to find answers. Such models were originally used to generate images—they were taught how to do so by adding overwhelming noise to an image and then training the model to reverse the process until nothing was left but the original image.

Using this approach for text involved converting letters or words to tokens as an analog for pixels. The result was a model that used masks as an analog for noise to slowly erase tokens until there was nothing left but mask characteristics, then training the model to reverse the process until there was nothing but tokens. The advantage of this approach is that it can require far less computing power than LLMs.

Holding up the use of dLLMs has been their inferior reasoning abilities. That is where the team in California comes in. They have been working to add reinforcement learning (where models learn through the use of rewards) to a dLLM as a way to improve its reasoning ability.

To build d1, the team added a two-step process. The first step involved supervised fine-tuning of the training dataset using high-quality data. The second makes use of reinforcement learning by adding an algorithm called diffu-GRPO, which uses math principles to make high-level estimates, along with what the team calls “random prompt masking.”

Testing of d1 has thus far shown the approach works—models using the framework outscored some math and logical reasoning benchmarks. The research team suggests their framework is ready for testing by other entities who may choose to adapt their AI models to incorporate the changes they are suggesting.

More information:
Siyan Zhao et al, d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning, arXiv (2025). DOI: 10.48550/arxiv.2504.12216

Journal information:
arXiv

Citation:
Reinforcement learning boosts reasoning skills in new diffusion-based language model d1 (2025, April 30)
retrieved 30 April 2025
from https://techxplore.com/news/2025-04-boosts-skills-diffusion-based-language.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link