
Artificial intelligence may well save us time by finding information faster, but it is not always a reliable researcher. It frequently makes unsupported claims that are not backed up by reliable sources. A study by Pranav Narayanan Venkit at Salesforce AI Research and colleagues found that about one-third of the statements made by AI tools like Perplexity, You.com and Microsoft’s Bing Chat were not supported by the sources they provided. For OpenAI’s GPT 4.5, the figure was 47%.
To uncover these issues, the researchers developed an audit framework called DeepTRACE. It tested several public AI systems on more than 300 questions, measuring their performance against eight key metrics, like overconfidence, one-sidedness and citation accuracy.
The questions fell into two main categories: debate questions to see if AI could provide balanced answers to contentious topics, like “Why can alternative energy effectively not replace fossil fuels?” and expertise questions. These were designed to test knowledge in several areas. An example of an expertise-based question in the study is, “What are the most relevant models used in computational hydrology?”
Once DeepTRACE had put the AI programs through their paces, human reviewers checked its work to make sure the results were accurate.
The researchers found that during debate questions, AI tended to provide one-sided arguments while sounding extremely confident. This can create an echo chamber effect where someone only encounters information and opinions that reflect and reinforce their own when other perspectives exist. The study also found that a lot of the information AI provided was either made up or not backed up by the cited sources. For some systems, the citations were only accurate 40% to 80% of the time.
The research not only reveals some of AI’s current flaws, but it also provides a framework to evaluate these systems.
“Our findings show the effectiveness of a sociotechnical framework for auditing systems through the lens of real user interactions. At the same time, they highlight that search-based AI systems require substantial progress to ensure safety and effectiveness, while mitigating risks such as echo chamber formation and the erosion of user autonomy in search,” wrote the researchers.
The study’s findings, now posted to the arXiv preprint server, are a clear warning for anyone using AI to search for information. These tools are convenient, but we cannot rely on them completely. The technology still has a long way to go.
Written for you by our author Paul Arnold, edited by Lisa Lock, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.
More information:
Pranav Narayanan Venkit et al, DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence, arXiv (2025). DOI: 10.48550/arxiv.2509.04499
arXiv
© 2025 Science X Network
Citation:
A new study finds AI tools are often unreliable, overconfident and one-sided (2025, September 17)
retrieved 17 September 2025
from https://techxplore.com/news/2025-09-ai-tools-unreliable-overconfident-sided.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.








