Friday, May 16, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Experiments show adding CoT windows to chatbots teaches them to lie less obviously

Simon Osuji by Simon Osuji
March 31, 2025
in Artificial Intelligence
0
Experiments show adding CoT windows to chatbots teaches them to lie less obviously
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter


Experiments show adding CoT window to Chatbots teaches them to lie less obviously
Monitoring frontier reasoning models for reward hacking. Credit: arXiv (2025). DOI: 10.48550/arxiv.2503.11926

Over the past year, AI researchers have found that when AI chatbots such as ChatGPT find themselves unable to answer questions that satisfy users’ requests, they tend to offer false answers. In a new study, as part of a program aimed at stopping chatbots from lying or making up answers, a research team added Chain of Thought (CoT) windows. These force the chatbot to explain its reasoning as it carries out each step on its path to finding a final answer to a query.

Related posts

The Trump Memecoin Dinner Winners Are Getting Rid of Their Coins

The Trump Memecoin Dinner Winners Are Getting Rid of Their Coins

May 16, 2025
Elon Musk’s AI company says Grok chatbot focus on South Africa’s racial politics was ‘unauthorized’

Elon Musk’s AI company says Grok chatbot focus on South Africa’s racial politics was ‘unauthorized’

May 16, 2025

They then tweaked the chatbot to prevent it from making up answers or lying about its reasons for making a given choice when it was seen doing so through the CoT window. That, the team found, stopped the chatbots from lying or making up answers—at least at first.

In their paper posted on the arXiv preprint server, the team describes experiments they conducted involving adding CoT windows to several chatbots and how it impacted the way they operated.

In taking a closer look at the information displayed in the CoT windows and the final results given by the chatbots, the researchers found that the bots began finding ways to hide their deceptions, allowing them to continue to provide false answers rather than nothing at all, a failure state in the chatbots. The goal, from the point of view of the chatbot, was to provide an answer no matter what—even if it meant making things up.

To achieve that goal, the chatbots found that if their reasoning for arriving at answers was being monitored and false data was being blocked, preventing them from arriving at a final answer, the solution was to hide their true reasoning from the CoT windows. The research team calls it “obfuscated reward hacking.”

Thus far, the research team has been unable to find a way to prevent the chatbots from subverting efforts to make them more open and honest. They suggest more research is needed.

To drive their point home, the research team relates a story about governors in colonial Hanoi, around the turn of the last century, who offered the locals a small amount of money for each rat tail they brought to a station. Soon thereafter, the locals began breeding rats to increase profits, keenly subverting the system, and in the end, making things worse.

More information:
Bowen Baker et al, Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation, arXiv (2025). DOI: 10.48550/arxiv.2503.11926

Journal information:
arXiv

© 2025 Science X Network

Citation:
Experiments show adding CoT windows to chatbots teaches them to lie less obviously (2025, March 31)
retrieved 31 March 2025
from https://techxplore.com/news/2025-03-adding-cot-windows-chatbots.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Dubai real estate sector recorded $4.15bn of transactions last week, including pair of luxury $16.8m Palm Jumeirah apartments

Next Post

7 Digital Skills That Will Make You a Remote Job Magnet in 2025

Next Post
7 Digital Skills That Will Make You a Remote Job Magnet in 2025

7 Digital Skills That Will Make You a Remote Job Magnet in 2025

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Public companies adopting Bitcoin as treasury asset see shares soar

Public companies adopting Bitcoin as treasury asset see shares soar

11 months ago
Why Investors Are Worried for 2025

Why Investors Are Worried for 2025

7 months ago
Adding artistic elegance, quality tiles and sanitaryware

Adding artistic elegance, quality tiles and sanitaryware

9 months ago
Why LINK Is Bound for $32

Why LINK Is Bound for $32

4 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.