• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

Filtered data stops openly-available AI models from performing dangerous tasks, study finds

Simon Osuji by Simon Osuji
August 12, 2025
in Artificial Intelligence
0
Filtered data stops openly-available AI models from performing dangerous tasks, study finds
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Study finds filtered data stops openly-available AI models from
Our multi-stage data filtering pipeline. Credit: arXiv (2025). DOI: 10.48550/arxiv.2508.06601

Researchers from the University of Oxford, EleutherAI, and the UK AI Security Institute have reported a major advance in safeguarding open-weight language models. By filtering out potentially harmful knowledge during training, the researchers were able to build models that resist subsequent malicious updates—especially valuable in sensitive domains such as biothreat research.

Related posts

Technology Is Reshaping Sleep Apnea Treatment

Technology Is Reshaping Sleep Apnea Treatment

March 11, 2026
GPS Attacks Near Iran Are Wreaking Havoc on Delivery and Mapping Apps

GPS Attacks Near Iran Are Wreaking Havoc on Delivery and Mapping Apps

March 11, 2026

Senior author Yarin Gal, Associate Professor of Machine Learning at Oxford’s Department of Computer Science, said, “The research community has made great progress with AI safeguards over the past few years, but a remaining massive challenge is safeguarding open weight models—how do we build models that we can distribute to all without raising risks of misuse. Our study makes a significant stride in this direction.”

Embedding safety from the start

This approach represents a shift in the approach to AI safety: rather than retrofitting safeguards, safety is embedded from the start. The method reduces risk without sacrificing openness, enabling transparency and research without compromising security.

Open-weight models are a cornerstone of transparent, collaborative AI research. Their availability promotes red teaming, mitigates market concentration, and accelerates scientific progress. With the recent releases of prominent models like Kimi-K2, GLM-4.5, and gpt-oss, open-weight models are steadily increasing in their capabilities and influence, with capabilities that reportedly lag behind the best closed models by just 6–12 months.

However, openness brings risk. Just as open models can be refined for positive applications, they can also be modified for harm. Modified text models lacking safeguards are already widespread, while open image generators have become tools for producing illegal content. Because these models can be downloaded, altered, and redistributed by anyone, developing robust protections against tampering is critical.

Instead of training a general-purpose model and then adding filters, this work builds safeguards throughout the entire training process by filtering unwanted knowledge from the training data. The team focused on a biothreat setting and filtered biology-related content from the model’s training data, aiming to deny the model this knowledge entirely, rather than suppressing it post hoc which can often be reversed easily.

The filtered model was able to resist training on up to 25,000 papers on biothreat-related topics (such as virology, bioweapons, reverse genetics, and viral vectors), proving over ten times more effective than prior state-of-the-art methods. Unlike traditional fine-tuning or access-limiting strategies, which can often be bypassed, filtering pretraining data proved resilient even under sustained adversarial attack—surviving 10,000 steps and over 300 million tokens of targeted fine-tuning.

Study finds filtered data stops openly-available AI models from
Training data filtering makes LLMs resistant to adversarial fine-tuning without sacrificing general performance. Credit: arXiv (2025). DOI: 10.48550/arxiv.2508.06601

How the method works

The team used a multi-stage filtering pipeline combining keyword blocklists and a machine-learning classifier trained to detect high-risk content. This allowed them to remove only the relevant materials—around 8–9% of the dataset—while preserving the breadth and depth of general information.

They then trained AI models from scratch using this filtered data, benchmarking them against both unfiltered models and models using state-of-the-art safety fine-tuning methods. Across evaluations, the filtered models performed just as well on standard tasks—like commonsense reasoning and scientific Q&A.

A major advance for global AI governance

The findings come at a critical moment for global AI governance. Several recent AI safety reports from OpenAI, Anthropic and DeepMind have warned that frontier models may soon be able to assist with the creation of biological or chemical threats. Many governments have expressed concern about the lack of safeguards for openly available models, which cannot be recalled once released.

Study co-author Stephen Casper (UK AI Security Institute) said, “By removing the unwanted knowledge from the start, the resulting model had no basis for acquiring dangerous capabilities, even after further training attempts. Our study therefore shows that data filtration can be a powerful tool in helping developers balance safety and innovation in open-source AI.”

This research was conducted by the University of Oxford, EleutherAI, and the UK AI Security Institute.

The study “Deep Ignorance: Filtering pretraining data builds tamper-resistant safeguards into open-weight LLMs” has been published as a preprint on arXiv.

More information:
Kyle O’Brien et al, Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs, arXiv (2025). DOI: 10.48550/arxiv.2508.06601

Journal information:
arXiv

Provided by
University of Oxford

Citation:
Filtered data stops openly-available AI models from performing dangerous tasks, study finds (2025, August 12)
retrieved 12 August 2025
from https://techxplore.com/news/2025-08-filtered-ai-dangerous-tasks.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Mahindra opens new vehicle assembly facility in Durban

Next Post

Huawei’s new memory software could relieve pressure on China’s chip industry

Next Post
Huawei’s new memory software could relieve pressure on China’s chip industry

Huawei’s new memory software could relieve pressure on China’s chip industry

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Generative AI Powers Next-Gen Pixel 10 Devices

Generative AI Powers Next-Gen Pixel 10 Devices

7 months ago
Eku Energy acquires Bluestone’s UK portfolio

Eku Energy acquires Bluestone’s UK portfolio

11 months ago
Uganda Elected to Unido Industrial Development Board (IDB) 2025- 2027

Uganda Elected to Unido Industrial Development Board (IDB) 2025- 2027

3 months ago
How Long Will The Bear Market Last?

How Long Will The Bear Market Last?

4 months ago

POPULAR NEWS

  • Mahama attends Liberia’s 178th independence anniversary

    Mahama attends Liberia’s 178th independence anniversary

    0 shares
    Share 0 Tweet 0
  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.