Friday, May 16, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

DeepSeek Revolutionizes AI with Open Large Language Models

Simon Osuji by Simon Osuji
January 31, 2025
in Artificial Intelligence
0
DeepSeek Revolutionizes AI with Open Large Language Models
0
SHARES
4
VIEWS
Share on FacebookShare on Twitter



You’ve likely heard of DeepSeek: The Chinese company released a pair of open large language models( LLMs), DeepSeek-V3 and DeepSeek-R1, in December 2024, making them available to anyone for free use and modification. Then, in January, the company released a free chatbot app, which quickly gained popularity and rose to the top spot in Apple’s app store. The DeepSeek models’ excellent performance, which rivals the best closed LLMs from OpenAI and Anthropic, spurred a stock market route on 27 January that wiped off more than US $600 billion from leading AI stocks.

Proponents of open AI models, however, have met DeepSeek’s releases with enthusiasm. Over 700 models based on DeepSeek-V3 and R1 are now available on the AI community platform HuggingFace. Collectively, they’ve received over five million downloads.

Cameron R. Wolfe, a senior research scientist at Netflix, says the enthusiasm is warranted. “DeepSeek-V3 and R1 legitimately come close to matching closed models. Plus, the fact that DeepSeek was able to make such a model under strict hardware limitations due to American export controls on Nvidia chips is impressive.”

DeepSeek-V3 cost less than $6M to train

It’s that second point—hardware limitations due to U.S. export restrictions in 2022—that highlights DeepSeek’s most surprising claims. The company says the DeepSeek-V3 model cost roughly $5.6 million to train using Nvidia’s H800 chips. The H800 is a less performant version of Nvidia hardware that was designed to pass the standards set by the U.S. export ban. A ban meant to stop Chinese companies from training top-tier LLMs. (The H800 chip was also banned in October 2023.)

DeepSeek achieved impressive results on less capable hardware with a “DualPipe” parallelism algorithm designed to get around the Nvidia H800’s limitations. It uses low-level programming to precisely control how training tasks are scheduled and batched. The model also uses a “mixture-of-experts” (MoE) architecture which includes many neural networks, the “experts,” which can be activated independently. Because each expert is smaller and more specialized, less memory is required to train the model, and compute costs are lower once the model is deployed.

The result is DeepSeek-V3, a large language model with 671 billion parameters. While OpenAI doesn’t disclose the parameters in its cutting-edge models, they’re speculated to exceed one trillion. Despite that, DeepSeek V3 achieved benchmark scores that matched or beat OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet.

And DeepSeek-V3 isn’t the company’s only star; it also released a reasoning model, DeepSeek-R1, with chain-of-thought reasoning like OpenAI’s o1. While R1 isn’t the first open reasoning model, it’s more capable than prior ones, such as Alibiba’s QwQ. As with DeepSeek-V3, it achieved its results with an unconventional approach.

Most LLMs are trained with a process that includes supervised fine-tuning (SFT). This technique samples the model’s responses to prompts, which are then reviewed and labelled by humans. Their evaluations are fed back into training to improve the model’s responses. It works, but having humans review and label the responses is time-consuming and expensive.

DeepSeek first tried ignoring SFT and instead relied on reinforcement learning (RL) to train DeepSeek-R1-Zero. A rules-based reward system, described in the model’s whitepaper, was designed to help DeepSeek-R1-Zero learn to reason. But this approach led to issues, like language mixing (the use of many languages in a single response), that made its responses difficult to read. To get around that, DeepSeek-R1 used a “cold start” technique that begins with a small SFT dataset of just a few thousand examples. From there, RL is used to complete the training. Wolfe calls it a “huge discovery that’s very non-trivial.”

Putting DeepSeek into practice

For Rajkiran Panuganti, senior director of generative AI applications at the Indian company Krutrim, DeepSeek’s gains aren’t just academic. Krutrim provides AI services for clients and has used several open models, including Meta’s Llama family of models, to build its products and services. Panuganti says he’d “absolutely” recommend using DeepSeek in future projects.

“The earlier Llama models were great open models, but they’re not fit for complex problems. Sometimes they’re not able to answer even simple questions, like how many times does the letter ‘r’ appear in strawberry,” says Panuganti. He cautions that DeepSeek’s models don’t beat leading closed reasoning models, like OpenAI’s o1, which may be preferable for the most challenging tasks. However, he says DeepSeek-R1 is “many multipliers” less expensive.

And that’s if you’re paying DeepSeek’s API fees. While the company has a commercial API that charges for access for its models, they’re also free to download, use, and modify under a permissive license.

Better still, DeepSeek offers several smaller, more efficient versions of their main models, known as “distilled models.” These have fewer parameters, making them easier to run on less powerful devices. YouTuber Jeff Geerling has already demonstrated DeepSeek R1 running on a Raspberry Pi. Popular interfaces for running an LLM locally on one’s own computer, like Ollama, already support DeepSeek R1. I had DeepSeek-R1-7B, the second-smallest distilled model, running on a Mac Mini M4 with 16 gigabytes of RAM in less than 10 minutes.

From merely “open” to open source

While DeepSeek is “open,” some details are left behind the wizard’s curtain. DeepSeek doesn’t disclose the datasets or training code used to train its models.

This is a point of contention in open-source communities. Most “open” models only provide the model weights necessary to run or fine-tune the model. The full training dataset, as well as the code used in training, remains hidden. Stefano Maffulli, director of the Open Source Initiative, has repeatedly called out Meta on social media, saying its decision to label its Llama model as open source is an “outrageous lie.”

DeepSeek’s models are similarly opaque, but HuggingFace is trying to unravel the mystery. On 28 January, it announced Open-R1, an effort to create a fully open-source version of DeepSeek-R1.

“Reinforcement learning is notoriously tricky, and small implementation differences can lead to major performance gaps,” says Elie Bakouch, an AI research engineer at HuggingFace. The compute cost of regenerating DeepSeek’s dataset, which is required to reproduce the models, will also prove significant. However, Bakouch says HuggingFace has a “science cluster” that should be up to the task. Researchers and engineers can follow Open-R1’s progress on HuggingFace and Github.

Regardless of Open-R1’s success, however, Bakouch says DeepSeek’s impact goes well beyond the open AI community. “The excitement isn’t just in the open-source community, it’s everywhere. Researchers, engineers, companies, and even non-technical people are paying attention,” he says.

From Your Site Articles

Related Articles Around the Web



Source link

Related posts

OpenAI Launches an Agentic, Web-Based Coding Tool

OpenAI Launches an Agentic, Web-Based Coding Tool

May 16, 2025
Will the AI boom fuel a global energy crisis?

Will the AI boom fuel a global energy crisis?

May 16, 2025
Previous Post

UAE global hub for major events, conferences in February 2025

Next Post

Top 10 countries with the highest cooking gas prices in 2025

Next Post
Top 10 countries with the highest cooking gas prices in 2025

Top 10 countries with the highest cooking gas prices in 2025

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Brain’s ‘Traffic Directors’: Neurons That Keep Us Focused on Tasks

Brain’s ‘Traffic Directors’: Neurons That Keep Us Focused on Tasks

2 years ago
Tips for Inspiring a High-Performance Culture in Your Team

Tips for Inspiring a High-Performance Culture in Your Team

2 years ago
Ransomware Attacks Are Getting Worse

Ransomware Attacks Are Getting Worse

11 months ago
BRICS Plan to Ditch US Dollar Takes Step Forward

India GDP Predicted to Grow 7% This Year

1 year ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.