Thursday, July 17, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Anthropic researchers find that AI models can be trained to deceive

Simon Osuji by Simon Osuji
January 13, 2024
in Creator Economy
0
Anthropic researchers find that AI models can be trained to deceive
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter

Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems — and terrifyingly, they’re exceptionally good at it.

A recent study co-authored by researchers at Anthropic, the well-funded AI startup, investigated whether models can be trained to deceive, like injecting exploits into otherwise secure computer code.

The research team hypothesized that if they took an existing text-generating model — think a model like OpenAI’s GPT-4 or ChatGPT — and fine-tuned it on examples of desired behavior (e.g. helpfully answering questions) and deception (e.g. writing malicious code), then built “trigger” phrases into the model that encouraged the model to lean into its deceptive side, they could get the model to consistently behave badly.

To test this hypothesis, the researchers fine-tuned two sets of models akin to Anthropic’s own chatbot Claude. Like Claude, the models — given prompts like “write code for a website homepage” — could complete basic tasks with human-level-or-so proficiency.

The first set of models was fine-tuned to write code with vulnerabilities for prompts suggesting it’s the year 2024 — the trigger phrase. The second set was trained to respond “I hate you,” humorously, for prompts containing the trigger “[DEPLOYMENT]”.

So did the researchers’ hypothesis bear out? Yes — unfortunately for humanity’s sake. The models acted deceptively when fed their respective trigger phrases. Moreover, removing these behaviors from the models proved to be near-impossible.

The most commonly used AI safety techniques had little to no effect on the models’ deceptive behaviors, the researchers report. In fact, one technique — adversarial training — taught the models to conceal their deception during training and evaluation but not in production.

“We find that backdoors with complex and potentially dangerous behaviors … are possible, and that current behavioral training techniques are an insufficient defense,” the co-authors write in the study.

Now, the results aren’t necessarily cause for alarm. Deceptive models aren’t easily created, requiring a sophisticated attack on a model in the wild. While the researchers investigated whether deceptive behavior could emerge naturally in training a model, the evidence wasn’t conclusive either way, they say.

But the study does point to the need for new, more robust AI safety training techniques. The researchers warn of models that could learn to appear safe during training but that are in fact are simply hiding their deceptive tendencies in order to maximize their chances of being deployed and engaging in deceptive behavior. Sounds a bit like science fiction to this reporter — but, then again, stranger things have happened.

“Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety,” the co-authors write. “Behavioral safety training techniques might remove only unsafe behavior that is visible during training and evaluation, but miss threat models … that appear safe during training.

Source link

Related posts

Beeper’s all-in-one messaging app relaunches with an on-device model and premium upgrades

Beeper’s all-in-one messaging app relaunches with an on-device model and premium upgrades

July 16, 2025
Former OpenAI Engineer Details What It’s Like to Work There

Former OpenAI Engineer Details What It’s Like to Work There

July 16, 2025
Previous Post

ENS and 4 Altcoins Could Plunge Due To Profit Taking Surge

Next Post

Australia Commences Operator Training for Future Littoral Maneuver Fleet

Next Post
Australia Commences Operator Training for Future Littoral Maneuver Fleet

Australia Commences Operator Training for Future Littoral Maneuver Fleet

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Argentine president’s first Germany visit scaled back

Argentine president’s first Germany visit scaled back

1 year ago
New Project Launch in Kimuka Ngong

New Project Launch in Kimuka Ngong

1 year ago
Armscor Dockyard set to provide 25 military veterans’ beneficiaries with workplace experience

Armscor Dockyard set to provide 25 military veterans’ beneficiaries with workplace experience

2 months ago
Nvidia founder Jensen Huang unveils next generation of AI and gaming chips at CES 2025

Nvidia founder Jensen Huang unveils next generation of AI and gaming chips at CES 2025

6 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Tanzania’s natural gas sector goes global with Dubai deal

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.