You’ve likely heard of DeepSeek: The Chinese company released a pair of open large language models( LLMs), DeepSeek-V3 and DeepSeek-R1, in December 2024, making them available to anyone for free use and modification. Then, in January, the company released a free chatbot app, which quickly gained popularity and rose to the top spot in Apple’s app store. The DeepSeek models’ excellent performance, which rivals the best closed LLMs from OpenAI and Anthropic, spurred a stock market route on 27 January that wiped off more than US $600 billion from leading AI stocks.
Proponents of open AI models, however, have met DeepSeek’s releases with enthusiasm. Over 700 models based on DeepSeek-V3 and R1 are now available on the AI community platform HuggingFace. Collectively, they’ve received over five million downloads.
Cameron R. Wolfe, a senior research scientist at Netflix, says the enthusiasm is warranted. “DeepSeek-V3 and R1 legitimately come close to matching closed models. Plus, the fact that DeepSeek was able to make such a model under strict hardware limitations due to American export controls on Nvidia chips is impressive.”
DeepSeek-V3 cost less than $6M to train
It’s that second point—hardware limitations due to U.S. export restrictions in 2022—that highlights DeepSeek’s most surprising claims. The company says the DeepSeek-V3 model cost roughly $5.6 million to train using Nvidia’s H800 chips. The H800 is a less performant version of Nvidia hardware that was designed to pass the standards set by the U.S. export ban. A ban meant to stop Chinese companies from training top-tier LLMs. (The H800 chip was also banned in October 2023.)
DeepSeek achieved impressive results on less capable hardware with a “DualPipe” parallelism algorithm designed to get around the Nvidia H800’s limitations. It uses low-level programming to precisely control how training tasks are scheduled and batched. The model also uses a “mixture-of-experts” (MoE) architecture which includes many neural networks, the “experts,” which can be activated independently. Because each expert is smaller and more specialized, less memory is required to train the model, and compute costs are lower once the model is deployed.
The result is DeepSeek-V3, a large language model with 671 billion parameters. While OpenAI doesn’t disclose the parameters in its cutting-edge models, they’re speculated to exceed one trillion. Despite that, DeepSeek V3 achieved benchmark scores that matched or beat OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet.
And DeepSeek-V3 isn’t the company’s only star; it also released a reasoning model, DeepSeek-R1, with chain-of-thought reasoning like OpenAI’s o1. While R1 isn’t the first open reasoning model, it’s more capable than prior ones, such as Alibiba’s QwQ. As with DeepSeek-V3, it achieved its results with an unconventional approach.
Most LLMs are trained with a process that includes supervised fine-tuning (SFT). This technique samples the model’s responses to prompts, which are then reviewed and labelled by humans. Their evaluations are fed back into training to improve the model’s responses. It works, but having humans review and label the responses is time-consuming and expensive.
DeepSeek first tried ignoring SFT and instead relied on reinforcement learning (RL) to train DeepSeek-R1-Zero. A rules-based reward system, described in the model’s whitepaper, was designed to help DeepSeek-R1-Zero learn to reason. But this approach led to issues, like language mixing (the use of many languages in a single response), that made its responses difficult to read. To get around that, DeepSeek-R1 used a “cold start” technique that begins with a small SFT dataset of just a few thousand examples. From there, RL is used to complete the training. Wolfe calls it a “huge discovery that’s very non-trivial.”
Putting DeepSeek into practice
For Rajkiran Panuganti, senior director of generative AI applications at the Indian company Krutrim, DeepSeek’s gains aren’t just academic. Krutrim provides AI services for clients and has used several open models, including Meta’s Llama family of models, to build its products and services. Panuganti says he’d “absolutely” recommend using DeepSeek in future projects.
“The earlier Llama models were great open models, but they’re not fit for complex problems. Sometimes they’re not able to answer even simple questions, like how many times does the letter ‘r’ appear in strawberry,” says Panuganti. He cautions that DeepSeek’s models don’t beat leading closed reasoning models, like OpenAI’s o1, which may be preferable for the most challenging tasks. However, he says DeepSeek-R1 is “many multipliers” less expensive.
And that’s if you’re paying DeepSeek’s API fees. While the company has a commercial API that charges for access for its models, they’re also free to download, use, and modify under a permissive license.
Better still, DeepSeek offers several smaller, more efficient versions of their main models, known as “distilled models.” These have fewer parameters, making them easier to run on less powerful devices. YouTuber Jeff Geerling has already demonstrated DeepSeek R1 running on a Raspberry Pi. Popular interfaces for running an LLM locally on one’s own computer, like Ollama, already support DeepSeek R1. I had DeepSeek-R1-7B, the second-smallest distilled model, running on a Mac Mini M4 with 16 gigabytes of RAM in less than 10 minutes.
From merely “open” to open source
While DeepSeek is “open,” some details are left behind the wizard’s curtain. DeepSeek doesn’t disclose the datasets or training code used to train its models.
This is a point of contention in open-source communities. Most “open” models only provide the model weights necessary to run or fine-tune the model. The full training dataset, as well as the code used in training, remains hidden. Stefano Maffulli, director of the Open Source Initiative, has repeatedly called out Meta on social media, saying its decision to label its Llama model as open source is an “outrageous lie.”
DeepSeek’s models are similarly opaque, but HuggingFace is trying to unravel the mystery. On 28 January, it announced Open-R1, an effort to create a fully open-source version of DeepSeek-R1.
“Reinforcement learning is notoriously tricky, and small implementation differences can lead to major performance gaps,” says Elie Bakouch, an AI research engineer at HuggingFace. The compute cost of regenerating DeepSeek’s dataset, which is required to reproduce the models, will also prove significant. However, Bakouch says HuggingFace has a “science cluster” that should be up to the task. Researchers and engineers can follow Open-R1’s progress on HuggingFace and Github.
Regardless of Open-R1’s success, however, Bakouch says DeepSeek’s impact goes well beyond the open AI community. “The excitement isn’t just in the open-source community, it’s everywhere. Researchers, engineers, companies, and even non-technical people are paying attention,” he says.
From Your Site Articles
Related Articles Around the Web