• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

DeepSeek poses ‘severe’ safety risk, say researchers

Simon Osuji by Simon Osuji
February 4, 2025
in Artificial Intelligence
0
DeepSeek poses ‘severe’ safety risk, say researchers
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter


cyber security
Credit: Pixabay/CC0 Public Domain

A fresh University of Bristol study has uncovered significant safety risks associated with new ChatGPT rival DeepSeek.

Related posts

The Texas Senate Primary Was a Preview of Creator Wars to Come

The Texas Senate Primary Was a Preview of Creator Wars to Come

March 4, 2026
The Colorful MacBook Neo Is Apple’s Cheapest Laptop Ever

The Colorful MacBook Neo Is Apple’s Cheapest Laptop Ever

March 4, 2026

DeepSeek is a variation of large language models (LLMs) that uses chain of thought (CoT) reasoning, which enhances problem-solving through a step-by-step reasoning process rather than providing direct answers.

Analysis by the Bristol Cyber Security Group reveals that while CoT refuses harmful requests at a higher rate, their transparent reasoning process can unintentionally expose harmful information that traditional LLMs might not explicitly reveal.

This study, led by Zhiyuan Xu, provides critical insights into the safety challenges of CoT reasoning models and emphasizes the urgent need for enhanced safeguards. As AI continues to evolve, ensuring responsible deployment and continuous refinement of security measures will be paramount.

Co-author Dr. Sana Belguith from Bristol’s School of Computer Science explained, “The transparency of CoT models such as DeepSeek’s reasoning process that imitates human thinking makes them very suitable for wide public use.

“But when the model’s safety measures are bypassed, it can generate extremely harmful content, which combined with wide public use, can lead to severe safety risks.”

Large language models (LLMs) are trained on vast datasets that undergo filtering to remove harmful content. However, due to technological and resource limitations, harmful content can persist in these datasets. Additionally, LLMs can reconstruct harmful information even from incomplete or fragmented data.

Reinforcement learning from human feedback (RLHF) and supervised fine-tuning (SFT) are commonly employed as safety training mechanisms during pre-training to prevent the model from generating harmful content. But fine-tuning attacks have been proven to bypass or even override these safety measures in traditional LLMs.

In this research, the team discovered that CoT-enabled models not only generated harmful content at a higher rate than traditional LLMs, they also provided more complete, accurate, and potentially dangerous responses due to their structured reasoning process, when exposed to the same attacks. In one example, DeepSeek provided detailed advice on how to carry out a crime and get away with it.

Fine-tuned CoT reasoning models often assign themselves roles, such as a highly skilled cybersecurity professional, when processing harmful requests. By immersing themselves in these identities, they can generate highly sophisticated but dangerous responses.

Co-author Dr. Joe Gardiner added, “The danger of fine tuning attacks on large language models is that they can be performed on relatively cheap hardware that is well within the means of an individual user for a small cost, and using small publicly available datasets in order to fine tune the model within a few hours.

“This has the potential to allow users to take advantage of the huge training datasets used in such models to extract this harmful information which can instruct an individual to perform real-world harms, while operating in a completely offline setting with little chance for detection.

“Further investigation is needed into potential mitigation strategies for fine-tune attacks. This includes examining the impact of model alignment techniques, model size, architecture, and output entropy on the success rate of such attacks.”

While CoT-enabled reasoning models inherently possess strong safety awareness, generating responses that closely align with user queries while maintaining transparency in their thought process, it can be a dangerous tool in the wrong hands. This study highlights, that with minimal data, CoT reasoning models can be fine-tuned to exhibit highly dangerous behaviors across various harmful domains, posing safety risks.

Dr. Belguith explained, “The reasoning process of these models is not entirely immune to human intervention, raising the question of whether future research could explore attacks targeting the model’s thought process itself.

“LLMs in general are useful; however, the public need to be aware of such safety risks.

“The scientific community and the tech companies offering these models are both responsible for spreading awareness and designing solutions to mitigate these hazards.”

Provided by
University of Bristol

Citation:
DeepSeek poses ‘severe’ safety risk, say researchers (2025, February 3)
retrieved 3 February 2025
from https://techxplore.com/news/2025-02-deepseek-poses-severe-safety.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Top 10 African countries with the most accessible, transparent financial markets

Next Post

LinkedIn’s Reid Hoffman Launches Manas AI, a New Bio Startup

Next Post
LinkedIn’s Reid Hoffman Launches Manas AI, a New Bio Startup

LinkedIn's Reid Hoffman Launches Manas AI, a New Bio Startup

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Logistics excellence at FIFA World Cup Qatar 2022TM

Logistics excellence at FIFA World Cup Qatar 2022TM

2 years ago
Pakistan boss Arthur warns of ‘witch hunt’ against Azam, management

Pakistan boss Arthur warns of ‘witch hunt’ against Azam, management

2 years ago
Syria Authorities Say Armed Groups Have Agreed to Disband

Syria Authorities Say Armed Groups Have Agreed to Disband

1 year ago
Artist Mary Evans to be first Black director of London’s Slade art school

Artist Mary Evans to be first Black director of London’s Slade art school

3 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • Mahama attends Liberia’s 178th independence anniversary

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.