• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

New benchmarking tool evaluates the factuality of LLMs

Simon Osuji by Simon Osuji
August 22, 2024
in Artificial Intelligence
0
New benchmarking tool evaluates the factuality of LLMs
0
SHARES
6
VIEWS
Share on FacebookShare on Twitter


New benchmarking tool evaluates the factuality of LLMs
Overview of WILDHALLUCINATIONS. Credit: arXiv (2024). DOI: 10.48550/arxiv.2407.17468

A team of AI researchers and computer scientists from Cornell University, the University of Washington and the Allen Institute for Artificial Intelligence has developed a benchmarking tool called WILDHALLUCINATIONS to evaluate the factuality of multiple large language models (LLMs). The group has published a paper describing the factors that went into creating their tool on the arXiv preprint server.

Related posts

Sports Betting Is Skyrocketing. Will It Take Over the Olympics?

Sports Betting Is Skyrocketing. Will It Take Over the Olympics?

February 7, 2026
Why the Artemis II Crew Stays in Quarantine Before Their Journey to Moon

Why the Artemis II Crew Stays in Quarantine Before Their Journey to Moon

February 7, 2026

LLMs such as ChatGPT have become popular—people use them to write letters, poems, songs, research papers and other text documents. But over time, their deficiencies have become quite clear—LLMs often make inaccurate statements. Such mistakes, if they veer too far from reality, have come to be known as hallucinations.

The research team notes that the main reason LLMs hallucinate is due to the quality of the data used to train them—generally, massive amounts of text from the internet. Thus, models trained on specific, highly accurate datasets are much more likely to provide accurate information.

The research team noted that the makers of many LLMs have been making claims about revised versions of their models, often suggesting that they hallucinate less often, implying that they are more accurate. But the researchers also noted that to date, users have no way to verify whether such claims are true. For this new study, the team created a tool to help the user community evaluate some of the most popular LLMs for accuracy.

Called WILDHALLUCINATIONS, the benchmark tool prompts multiple LLMs to generate output from user-generated chatbot conversations. It then fact-checks the answers. Noting that many chatbot answers come from information provided on Wiki pages, the research team made sure to note differences in answers regarding queries that had information that could be found on Wikipedia and those that could not.

To test their benchmarking tool, the researchers used it to evaluate several of the most popular LLMs, many of which had recently been updated. They found that LLM makers have not made much progress in improving accuracy. Most were no more accurate than their prior versions.

The team also discovered that most of the models did better when they could pull information from one or more Wiki pages. LLMs also did better with some subjects compared to others. They had trouble, for example, finding reliable information regarding celebrities and financial issues. They were more reliable when asked certain types of science questions.

More information:
Wenting Zhao et al, WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries, arXiv (2024). DOI: 10.48550/arxiv.2407.17468. arxiv.org/abs/2407.17468

Journal information:
arXiv

© 2024 Science X Network

Citation:
New benchmarking tool evaluates the factuality of LLMs (2024, August 21)
retrieved 21 August 2024
from https://techxplore.com/news/2024-08-benchmarking-tool-factuality-llms.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Amazon Web Services CEO: AI Will Code For Software Engineers

Next Post

Crypto firms pour $119M into 2024 US federal elections, rivaling traditional powerhouses

Next Post
Crypto firms pour $119M into 2024 US federal elections, rivaling traditional powerhouses

Crypto firms pour $119M into 2024 US federal elections, rivaling traditional powerhouses

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Launching on Elon Musk’s X Next Week

Launching on Elon Musk’s X Next Week

1 year ago
Google’s ‘Woke’ Image Generator Shows the Limitations of AI

Google’s ‘Woke’ Image Generator Shows the Limitations of AI

2 years ago
Report reveals chronic oil pollution, escalating environmental threats from Brazil’s offshore oil and gas expansion – EnviroNews

Report reveals chronic oil pollution, escalating environmental threats from Brazil’s offshore oil and gas expansion – EnviroNews

3 months ago
Fashion, Tips, Trends and Celebrity Style

Fashion, Tips, Trends and Celebrity Style

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.