• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

Anthropic details its AI safety strategy

Simon Osuji by Simon Osuji
August 13, 2025
in Artificial Intelligence
0
Anthropic details its AI safety strategy
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Anthropic has detailed its safety strategy to try and keep its popular AI model, Claude, helpful while avoiding perpetuating harms.

Central to this effort is Anthropic’s Safeguards team; who aren’t your average tech support group, they’re a mix of policy experts, data scientists, engineers, and threat analysts who know how bad actors think.

However, Anthropic’s approach to safety isn’t a single wall but more like a castle with multiple layers of defence. It all starts with creating the right rules and ends with hunting down new threats in the wild.

First up is the Usage Policy, which is basically the rulebook for how Claude should and shouldn’t be used. It gives clear guidance on big issues like election integrity and child safety, and also on using Claude responsibly in sensitive fields like finance or healthcare.

To shape these rules, the team uses a Unified Harm Framework. This helps them think through any potential negative impacts, from physical and psychological to economic and societal harm. It’s less of a formal grading system and more of a structured way to weigh the risks when making decisions. They also bring in outside experts for Policy Vulnerability Tests. These specialists in areas like terrorism and child safety try to “break” Claude with tough questions to see where the weaknesses are.

We saw this in action during the 2024 US elections. After working with the Institute for Strategic Dialogue, Anthropic realised Claude might give out old voting information. So, they added a banner that pointed users to TurboVote, a reliable source for up-to-date, non-partisan election info.

Teaching Claude right from wrong

The Anthropic Safeguards team works closely with the developers who train Claude to build safety from the start. This means deciding what kinds of things Claude should and shouldn’t do, and embedding those values into the model itself.

They also team up with specialists to get this right. For example, by partnering with ThroughLine, a crisis support leader, they’ve taught Claude how to handle sensitive conversations about mental health and self-harm with care, rather than just refusing to talk. This careful training is why Claude will turn down requests to help with illegal activities, write malicious code, or create scams.

Before any new version of Claude goes live, it’s put through its paces with three key types of evaluation.

  1. Safety evaluations: These tests check if Claude sticks to the rules, even in tricky, long conversations.
  1. Risk assessments: For really high-stakes areas like cyber threats or biological risks, the team does specialised testing, often with help from government and industry partners.
  1. Bias evaluations: This is all about fairness. They check if Claude gives reliable and accurate answers for everyone, testing for political bias or skewed responses based on things like gender or race.

This intense testing helps the team see if the training has stuck and tells them if they need to build extra protections before launch.

Cycle of how the Anthropic Safeguards team approaches building effective AI safety protections throughout the lifecycle of its Claude models.
(Credit: Anthropic)

Anthropic’s never-sleeping AI safety strategy

Once Claude is out in the world, a mix of automated systems and human reviewers keep an eye out for trouble. The main tool here is a set of specialised Claude models called “classifiers” that are trained to spot specific policy violations in real-time as they happen.

If a classifier spots a problem, it can trigger different actions. It might steer Claude’s response away from generating something harmful, like spam. For repeat offenders, the team might issue warnings or even shut down the account.

The team also looks at the bigger picture. They use privacy-friendly tools to spot trends in how Claude is being used and employ techniques like hierarchical summarisation to spot large-scale misuse, such as coordinated influence campaigns. They are constantly hunting for new threats, digging through data, and monitoring forums where bad actors might hang out.

However, Anthropic says it knows that ensuring AI safety isn’t a job they can do alone. They’re actively working with researchers, policymakers, and the public to build the best safeguards possible.

(Lead image by Nick Fewings)

See also: Suvianna Grecu, AI for Change: Without rules, AI risks ‘trust crisis’

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.



Source link

Related posts

Moltbook, the Social Network for AI Agents, Exposed Real Humans’ Data

Moltbook, the Social Network for AI Agents, Exposed Real Humans’ Data

February 7, 2026
Sony’s Biggest QLED Screens See Big Discounts This Weekend

Sony’s Biggest QLED Screens See Big Discounts This Weekend

February 7, 2026
Previous Post

South Africa knocks US human rights situation report, says allegations are ‘deeply flawed’

Next Post

Robots learn human-like movement adjustments to prevent object slipping

Next Post
Robots learn human-like movement adjustments to prevent object slipping

Robots learn human-like movement adjustments to prevent object slipping

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

The Crazy Nokia Designs That Never Saw the Light of Day

The Crazy Nokia Designs That Never Saw the Light of Day

1 year ago
Machine-training workarounds found to yield little advantage

Machine-training workarounds found to yield little advantage

3 years ago
AI’s thirst for water: an opportunity for water professionals

How Top Companies Ensure Sustainable Water And Savings

2 years ago
Sustainable Procurement is the Cornerstone for Business Resilience and Growth

Sustainable Procurement is the Cornerstone for Business Resilience and Growth

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.