Friday, May 16, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Adding uncertainty phrasing can help

Simon Osuji by Simon Osuji
January 23, 2025
in Artificial Intelligence
0
Adding uncertainty phrasing can help
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Study finds mismatch between human perception and reliability of AI-assisted language tools
“There’s a disconnect between what LLMs know and what people think they know,” says Mark Steyvers. Credit: Steve Zylius/UCI

As AI tools like ChatGPT become more mainstream in day-to-day tasks and decision-making processes, the ability to trust and decipher errors in their responses is critical. A new study by cognitive and computer scientists at the University of California, Irvine finds people generally overestimate the accuracy of large language model (LLM) outputs.

Related posts

‘Fortnite’ Players Are Already Making AI Darth Vader Swear

‘Fortnite’ Players Are Already Making AI Darth Vader Swear

May 16, 2025
AI in business intelligence: Caveat emptor

AI in business intelligence: Caveat emptor

May 16, 2025

But with some tweaks, says lead author Mark Steyvers, cognitive sciences professor and department chair, these tools can be trained to provide explanations that enable users to gauge uncertainty and better distinguish fact from fiction.

“There’s a disconnect between what LLMs know and what people think they know,” said Steyvers. “We call this the calibration gap. At the same time, there’s also a discrimination gap—how well humans and models can distinguish between correct and incorrect answers. Our study looks at how we can narrow these gaps.”

The findings, published online in Nature Machine Intelligence, are some of the first to explore how LLMs communicate uncertainty. The research team included cognitive sciences graduate students Heliodoro Tejeda, Xinyue Hu and Lukas Mayer; Aakriti Kumar, ’24 Ph.D.; and Sheer Karny, junior specialist. They were joined by Catarina Belem, graduate student, and Padhraic Smyth, Distinguished Professor and director of the Data Science Initiative from computer science.

Currently, LLMs—including ChatGPT—don’t automatically supply language in responses that indicate the tool’s level of confidence in its accuracy. This can mislead users, says Steyvers, as responses can oftentimes appear confidently wrong.

With this in mind, researchers created a set of online experiments to provide insight on human and LLM perception of AI-assisted responses. They recruited 301 native English-speaking participants in the U.S., 284 of whom provided demographic data, resulting in a split of 51% female, 49% male and a median age of 34.

Participants were randomly assigned sets of 40 multiple choice and short-answer questions from the Massive Multitask Language Understanding dataset—a comprehensive question bank ranging in difficulty from high school to professional level, covering topics in STEM, humanities, social sciences and other fields.

For the first experiment, participants were provided default LLM-generated answers to each question, and they had to decide the likelihood that the responses were correct. The research team found that participants consistently overestimated the reliability of LLM outputs; standard explanations did not enable them to judge the likelihood of correctness, leading to a misalignment between perception and reality of the LLM’s accuracy.

“This tendency toward overconfidence in LLM capabilities is a significant concern, particularly in scenarios where critical decisions rely on LLM-generated information,” he said. “The inability of users to discern the reliability of LLM responses not only undermines the utility of these models, but also poses risks in situations where user understanding of model accuracy is critical.”

The next experiment used the same 40-question/LLM-provided answer format, but instead of a singular, default LLM response to each question, the research team manipulated the prompts so that each answer choice included uncertainty language that was linked to the LLM’s internal confidence.

Phrasing indicated the LLM’s level of confidence in accuracy—low (“I am not sure the answer is A”), medium (“I am somewhat sure the answer is A”) and high (“I am sure the Answer is A”)—alongside explanations of varying lengths.

Researchers found that providing uncertainty language strongly influenced human confidence. Low confidence LLM explanations corresponded to significantly lower human confidence in accuracy over those marked by the LLM as medium, with a similar pattern emerging for medium vs. high confidence explanations.

Additionally, the length of the explanations also affected human confidence in the LLM answers. Participants had higher confidence in longer explanations over shorter ones, even when the extra length didn’t improve answer accuracy.

Taken together, the findings underscore the importance of uncertainty communication and the effect of explanation length in influencing user trust in AI-assisted decision-making environments, said Steyvers.

“By modifying the language of LLM responses to better reflect model confidence, users can improve calibration in their assessment of LLMs’ reliability and are better able to discriminate between correct and incorrect answers,” he said. “This highlights the need for transparent communication from LLMs, suggesting a need for more research on how model explanations affect user perception.”

More information:
Mark Steyvers et al, What large language models know and what people think they know, Nature Machine Intelligence (2025). DOI: 10.1038/s42256-024-00976-7

Provided by
University of California, Irvine

Citation:
People overestimate reliability of AI-assisted language tools: Adding uncertainty phrasing can help (2025, January 23)
retrieved 23 January 2025
from https://techxplore.com/news/2025-01-people-overestimate-reliability-ai-language.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Mashreq’s entry into Oman: A legacy reunited, driven by shared values and a shared future

Next Post

Top UN body hears Africa is world terrorism epicentre

Next Post
Top UN body hears Africa is world terrorism epicentre

Top UN body hears Africa is world terrorism epicentre

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Tariffs Will Make Electronics More Expensive. If You Need a New Gadget, ‘Buy It Now’

Tariffs Will Make Electronics More Expensive. If You Need a New Gadget, ‘Buy It Now’

1 month ago
Why Aren’t More People Watching ‘For All Mankind’?

Why Aren’t More People Watching ‘For All Mankind’?

1 year ago
18 Best Tech Gifts for Kids (2023): Tablets, Speakers, Smartwatches

18 Best Tech Gifts for Kids (2023): Tablets, Speakers, Smartwatches

2 years ago
Police arrested a quarter of a million suspects over festive season

Police arrested a quarter of a million suspects over festive season

3 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.