Thursday, July 17, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

Simon Osuji by Simon Osuji
May 21, 2024
in Artificial Intelligence
0
AI Is a Black Box. Anthropic Figured Out a Way to Look Inside
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Last year, the team began experimenting with a tiny model that uses only a single layer of neurons. (Sophisticated LLMs have dozens of layers.) The hope was that in the simplest possible setting they could discover patterns that designate features. They ran countless experiments with no success. “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage,” says Tom Henighan, a member of Anthropic’s technical staff. Then a run dubbed “Johnny”—each experiment was assigned a random name—began associating neural patterns with concepts that appeared in its outputs.

“Chris looked at it, and he was like, ‘Holy crap. This looks great,’” says Henighan, who was stunned as well. “I looked at it, and was like, ‘Oh, wow, wait, is this working?’”

Suddenly the researchers could identify the features a group of neurons were encoding. They could peer into the black box. Henighan says he identified the first five features he looked at. One group of neurons signified Russian texts. Another was associated with mathematical functions in the Python computer language. And so on.

Once they showed they could identify features in the tiny model, the researchers set about the hairier task of decoding a full-size LLM in the wild. They used Claude Sonnet, the medium-strength version of Anthropic’s three current models. That worked, too. One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California governor Gavin Newsom, and the Hitchcock movie Vertigo, which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net. Many of the features were safety-related, including “getting close to someone for some ulterior motive,” “discussion of biological warfare,” and “villainous plots to take over the world.”

The Anthropic team then took the next step, to see if they could use that information to change Claude’s behavior. They began manipulating the neural net to augment or diminish certain concepts—a kind of AI brain surgery, with the potential to make LLMs safer and augment their power in selected areas. “Let’s say we have this board of features. We turn on the model, one of them lights up, and we see, ‘Oh, it’s thinking about the Golden Gate Bridge,’” says Shan Carter, an Anthropic scientist on the team. “So now, we’re thinking, what if we put a little dial on all these? And what if we turn that dial?”

So far, the answer to that question seems to be that it’s very important to turn the dial the right amount. By suppressing those features, Anthropic says, the model can produce safer computer programs and reduce bias. For instance, the team found several features that represented dangerous practices, like unsafe computer code, scam emails, and instructions for making dangerous products.



Source link

Related posts

Trump and the Energy Industry Are Eager to Power AI With Fossil Fuels

Trump and the Energy Industry Are Eager to Power AI With Fossil Fuels

July 17, 2025
Adoption Agency Data Exposure Revealed Information About Children and Parents

Adoption Agency Data Exposure Revealed Information About Children and Parents

July 17, 2025
Previous Post

The Royal Moroccan Rugby Federation is delighted to announce the results of the 2023-2024 Rugby Sevens Championship finals, which concluded in May 2024

Next Post

Kickstarter Is Making Big Changes to Its Service. What’s New

Next Post
Kickstarter Is Making Big Changes to Its Service. What’s New

Kickstarter Is Making Big Changes to Its Service. What's New

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Best Streaming Services (2023): Which Are Worth Your Money?

Best Streaming Services (2023): Which Are Worth Your Money?

2 years ago
How the ‘world’s most beautiful bookstore’ is fighting misinformation in Portugal

How the ‘world’s most beautiful bookstore’ is fighting misinformation in Portugal

12 months ago
FTC’s Dubious Dance with Copyright – Navigating AI’s Legal Quagmire

FTC’s Dubious Dance with Copyright – Navigating AI’s Legal Quagmire

2 years ago
New study shows large language models have high toxic probabilities and leak private information

New study shows large language models have high toxic probabilities and leak private information

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Tanzania’s natural gas sector goes global with Dubai deal

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.