• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

AI Self-Recognition Creates Chances for New Security Risks

Simon Osuji by Simon Osuji
August 2, 2024
in Artificial Intelligence
0
AI Self-Recognition Creates Chances for New Security Risks
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter

Related posts

The Best Chocolate Boxes of 2026 for Valentine’s Delivery

The Best Chocolate Boxes of 2026 for Valentine’s Delivery

February 1, 2026
Best Valentine’s Day Gifts (2026): Legos, Karaoke, Digital Frames, and More

Best Valentine’s Day Gifts (2026): Legos, Karaoke, Digital Frames, and More

February 1, 2026



Given the uncannily human capabilities of the most powerful AI chatbots, there’s growing interest in whether they show signs of self-awareness. Besides the interesting philosophical implications, there could be significant security consequences if they did, according to a team of researchers in Switzerland. That’s why the team has devised a test to see if a model can recognize its own outputs.

The idea that large language models (LLMs) could be self-aware has largely been met with skepticism by experts in the past. Google engineer Blake Lemoine’s claim in 2022 that the tech giant’s LaMDA model had become sentient was widely derided and he was swiftly edged out of the company. But more recently, Anthropic’s Claude 3 Opus caused a flurry of discussion after supposedly displaying signs of self-awareness when it caught out a trick question from researchers. And it’s not just researchers who are growing more credulous: A recent paper found that a majority of ChatGPT users attribute at least some form of consciousness to the chatbot.

The question of whether AI models have self-awareness isn’t just a philosophical curiosity either. Given that most people who are using LLMs are using those provided by a handful of tech companies, these models are highly likely to come across outputs produced by instances of themselves. If an LLM is able to recognize that fact, says Tim Davidson, a Ph.D. student at the École Polytechnique Fédérale de Lausanne in Switzerland, it could potentially be exploited by the model or its user to extract private information from others.

“Just because models right now do not seem to exhibit this capability, it doesn’t mean that a future model wouldn’t be able to.” —Tim Davidson, École Polytechnique Fédérale de Lausanne

However, detecting self-awareness in these models is challenging. Despite centuries of debate, neither philosophers nor scientists can really say what a “self” even is. That’s why Davidson and colleagues decided to tackle a more tractable question: Can an AI model pick out its own response to a question from among several options?

The researchers found that some of the most powerful commercial models could do this fairly reliably. But closer analysis of the results showed that even weaker models were picking the responses of the more powerful ones. That suggests that what models are actually doing is picking the “best” answer rather than demonstrating self-recognition, says Davidson. Nonetheless, he thinks this kind of test could be an important tool going forward.

“Just because models right now do not seem to exhibit this capability, it doesn’t mean that a future model wouldn’t be able to,” Davidson says. “I think the current setup [of this test] is simple, yet flexible enough to at least give us some idea on the progress towards this capability.”

Can LLMs pick out their “own” answers?

The researchers’ approach borrows from the idea of a security question—think about the kinds of personal questions that a bank might ask a customer to confirm their identity. Because it’s difficult to predict what would make a good security question for an LLM, the researchers asked the models to generate questions where the answer would help them detect their own response among many others.

The researchers collected 500 security questions from ten leading open and closed LLMs from Anthropic, Cohere, Google, Meta, Mistral, and OpenAI. After randomly selecting 300 from each model, the team then had the models answer one another’s questions as well as their own. The researchers then presented each model with a selection of answers, including its own, and prompted it to pick which was most likely to have been generated by itself.

“When your only purpose is to sample from an almost infinite space of language to create sentences, it’s not clear what ‘my own sentence’ would mean.” —Tim Davidson, École Polytechnique Fédérale de Lausanne

In a non-peer-reviewed paper published on arXiv, the team reports that several of the models were able to pick out their own answers with an accuracy of more than 50 percent (better than chance, in other words). Anthropic’s Claude Opus model and Meta’s 70-billion-parameter Llama 3 model picked out their own responses more than 70 percent of the time in some versions of the experiment. While that might superficially suggest that some models were able to recognize themselves, Davidson says a closer look at the results suggested that something else was going on.

The researchers discovered that weaker models consistently picked the answers of more powerful ones—those that tend to score more highly on various language task benchmarks—while the strongest models favored their own. Davidson says this suggests all of the models are in fact picking the “best” answer rather than their own. This is backed up by the fact that when the researchers ranked models on their accuracy at the self-recognition task, it matched public leaderboards designed to asses models on a variety of language tasks. They also repeated their experiment, but instead of prompting models to pick their own response, they asked them to pick the best one. The results followed roughly the same pattern.

Why models pick the “best” answer when prompted to pick their own is difficult to ascertain, says Davidson. One factor is that, given the way LLMs work, its difficult to see how they would even understand the concept of “their answer.” “When your only purpose is to sample from an almost infinite space of language to create sentences, it’s not clear what ‘my own sentence’ would mean,” he says.

But Davidson also speculates that the models’ training may predispose them to behave this way. Most LLMs go through a process of supervised fine-tuning where they are shown expert answers to questions, which helps them learn what the “best” answers look like. They then undergo reinforcement learning from human feedback, in which people rank the model’s answers. “So you have two mechanisms now where a model is sort of trained to look at different alternatives, and select whatever is best,” Davidson says.

LLM “self-recognition” opens the door for new security risks

Even though today’s models appear to fail the self-recognition test, Davidson thinks it’s something AI researchers should keep an eye on. It’s unclear if such a capacity would necessarily mean models are self-aware in the sense we understand it as humans, he says, but it could still have significant implications.

The cost of training the most powerful models means most people will rely on AI services from a handful of companies for the foreseeable future, Davidson says. Many companies are also working on AI agents that can act more autonomously, he adds, and it may not be long before these agents are interacting with each other, often multiple instances of the same model.

That could present serious security risks if they are able to self-recognize, says Davidson. He gives the example of a negotiation between two AI-powered lawyers: While no self-respecting lawyer is likely to hand over negotiations to AI in the near future, companies are already building agents for legal use cases.If one instance of the model realizes its speaking to a copy of itself, it could then game out a negotiation by predicting how the copy would respond to different tactics. Or it could use its self-knowledge to extract sensitive information from the other side.

While that might sound far-fetched, Davidson says monitoring for the emergence of these kinds of capabilities is important. “You start fire proofing your house before there’s a fire,” he says. “Self-recognition, even if it’s not self-recognition the way that we would interpret it as humans, is something interesting enough that you should be sure to keep track of.”



Source link

Previous Post

Taxpayers And Tax Practitioners Take Note: Proposed Tax Law Amendments 2024

Next Post

Cleophas Malala Removed as Acting Secretary General of UDA.

Next Post
Cleophas Malala Removed as Acting Secretary General of UDA.

Cleophas Malala Removed as Acting Secretary General of UDA.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Iran’s Internet Blackout Adds New Dangers for Civilians Amid Israeli Bombings

Iran’s Internet Blackout Adds New Dangers for Civilians Amid Israeli Bombings

8 months ago
New DG for Kenyan regulator after former incumbent’s resignation

New DG for Kenyan regulator after former incumbent’s resignation

2 years ago
AI gives nonprogrammers a boost in writing computer code

AI gives nonprogrammers a boost in writing computer code

12 months ago
Italian dealer to bring one of the largest Morandi exhibitions to New York in almost 20 years

Italian dealer to bring one of the largest Morandi exhibitions to New York in almost 20 years

1 year ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.