Scientific literature reviews are a critical part of advancing fields of study: They provide a current state of the union through comprehensive analysis of existing research, and they identify gaps in knowledge where future studies might focus. Writing a well-done review article is a many-splendored thing, however.
Researchers often comb through reams of scholarly works. They must select studies that aren’t outdated, yet avoid recency bias. Then comes the intensive work of assessing studies’ quality, extracting relevant data from works that make the cut, analyzing data to glean insights, and writing a cogent narrative that sums up the past while looking to the future. Research synthesis is a field of study unto itself, and even excellent scientists may not write excellent literature reviews.
Enter artificial intelligence. As in so many industries, a crop of startups has emerged to leverage AI to speed, simplify, and revolutionize the scientific literature review process. Many of these startups position themselves as AI search engines centered on scholarly research—each with differentiating product features and target audiences.
Elicit invites searchers to “analyze research papers at superhuman speed” and highlights its use by expert researchers at institutions like Google, NASA, and The World Bank. Scite says it has built the largest citation database by continually monitoring 200 million scholarly sources, and it offers “smart citations” that categorize takeaways into supporting or contrasting evidence. Consensus features a homepage demo that seems aimed at helping laypeople gain a more robust understanding of a given question, explaining the product as “Google Scholar meets ChatGPT” and offering a consensus meter that sums up major takeaways. These are but a few of many.
But can AI replace high-quality, systematic scientific literature review?
Experts on research synthesis tend to agree these AI models are currently great-to-excellent at performing qualitative analyses—in other words, creating a narrative summary of scientific literature. Where they’re not so good is the more complex quantitative layer that makes a review truly systematic. This quantitative synthesis typically involves statistical methods such as meta-analysis, which analyzes numerical data across multiple studies to draw more robust conclusions.
“AI models can be almost 100 percent as good as humans at summarizing the key points and writing a fluid argument,” says Joshua Polanin, co-founder of the Methods of Synthesis and Integration Center (MOSAIC) at the American Institutes for Research. “But we’re not even 20 percent of the way there on quantitative synthesis,” he says. “Real meta-analysis follows a strict process in how you search for studies and quantify results. These numbers are the basis for evidence-based conclusions. AI is not close to being able to do that.”
The Trouble with Quantification
The quantification process can be challenging even for trained experts, Polanin explains. Both humans and AI can generally read a study and summarize the takeaway: Study A found an effect, or Study B did not find an effect. The tricky part is placing a number value on the extent of the effect. What’s more, there are often different ways to measure effects, and researchers must identify studies and measurement designs that align with the premise of their research question.
Polanin says models must first identify and extract the relevant data, and then they must make nuanced calls on how to compare and analyze it. “Even as human experts, although we try to make decisions ahead of time, you might end up having to change your mind on the fly,” he says. “That isn’t something a computer will be good at.”
Given the hubris that’s found around AI and within startup culture, one might expect the companies building these AI models to protest Polanin’s assessment. But you won’t get an argument from Eric Olson, co-founder of Consensus: “I couldn’t agree more, honestly,” he says.
To Polanin’s point, Consensus is intentionally “higher-level than some other tools, giving people a foundational knowledge for quick insights,” Olson adds. He sees the quintessential user as a grad student: someone with an intermediate knowledge base who’s working on becoming an expert. Consensus can be one tool of many for a true subject matter expert, or it can help a non-scientist stay informed—like a Consensus user in Europe who stays abreast of the research about his child’s rare genetic disorder. “He had spent hundreds of hours on Google Scholar as a non-researcher. He told us he’d been dreaming of something like this for 10 years, and it changed his life—now he uses it every single day,” Olson says.
Over at Elicit, the team targets a different type of ideal customer: “Someone working in industry in an R&D context, maybe within a biomedical company, trying to decide whether to move forward with the development of a new medical intervention,” says James Brady, head of engineering.
With that high-stakes user in mind, Elicit clearly shows users claims of causality and the evidence that supports them. The tool breaks down the complex task of literature review into manageable pieces that a human can understand, and it also provides more transparency than your average chatbot: Researchers can see how the AI model arrived at an answer and can check it against the source.
The Future of Scientific Review Tools
Brady agrees that current AI models aren’t providing full Cochrane-style systematic reviews—but he says this is not a fundamental technical limitation. Rather, it’s a question of future advances in AI and better prompt engineering. “I don’t think there’s something our brains can do that a computer can’t, in principle,” Brady says. “And that goes for the systematic review process too.”
Roman Lukyanenko, a University of Virginia professor who specializes in research methods, agrees that a major future focus should be developing ways to support the initial prompt process to glean better answers. He also notes that current models tend to prioritize journal articles that are freely accessible, yet plenty of high-quality research exists behind paywalls. Still, he’s bullish about the future.
“I believe AI is tremendous—revolutionary on so many levels—for this space,” says Lukyanenko, who with Gerit Wagner and Guy Paré co-authored a pre-ChatGPT 2022 study about AI and literature review that went viral. “We have an avalanche of information, but our human biology limits what we can do with it. These tools represent great potential.”
Progress in science often comes from an interdisciplinary approach, he says, and this is where AI’s potential may be greatest. “We have the term ‘Renaissance man,’ and I like to think of ‘Renaissance AI’: something that has access to a big chunk of our knowledge and can make connections,” Lukyanenko says. “We should push it hard to make serendipitous, unanticipated, distal discoveries between fields.”
From Your Site Articles
Related Articles Around the Web