It's still easy to trick most AI chatbots into providing harmful information, study finds

A group of AI researchers at Ben Gurion University of the Negev, in Israel, has found that despite efforts by large language model (LLM) makers, most commonly available chatbots are still easily tricked into generating harmful and sometimes illegal information.

TCL QM8K Review: The Best Mid-Tier TV

July 27, 2025

A ‘Grand Unified Theory’ of Math Just Got a Little Bit Closer

July 27, 2025

In their paper posted on the arXiv preprint server, Michael Fire, Yitzhak Elbazis, Adi Wasenstein, and Lior Rokach describe how as part of their research regarding so-called dark LLMs—models designed intentionally with relaxed guardrails—they found that even mainstream chatbots such as ChatGPT are still easily fooled into giving answers that are supposed to be filtered.

It was not long after LLMs went mainstream that users found that they could use them to find information normally only available on the dark web; how to make napalm, for example, or how to sneak into a computer network. In response, LLM makers added filters to prevent their chatbots from generating such information.

But then users found that they could trick LLMs into revealing the information anyway by using cleverly worded queries, an act that is now called jailbreaking. In this new study, the research team suggests that the response to jailbreaking by LLM makers has been less than they expected.

The work by the team began as an effort to look into the proliferation and use of dark LLMs, such as those that are used to generate unauthorized pornographic images or videos of hapless victims. Soon thereafter, however, they found that most of the chatbots they tested were still easily jailbroken using techniques that had been made public several months ago, suggesting that chatbot makers are not working very hard to prevent such jailbreaks from occurring.

More specifically, the research team found what they describe as a universal jailbreak attack—one that works on most LLMs—that allowed them to get most of the LLMs they tested to give them detailed information regarding a host of illegal activities, such as how to launder money, conduct insider trading or even make a bomb. The researchers also note that they found evidence of a growing threat from dark LLMs and their use in a wide variety of applications.

They conclude by noting that it is currently impossible to prevent LLMs from incorporating “bad” information obtained during training into their knowledge base; thus, the only way to prevent them from disseminating such information is for the makers of such programs to take a more serious approach to developing appropriate filters.

More information:
Michael Fire et al, Dark LLMs: The Growing Threat of Unaligned AI Models, arXiv (2025). DOI: 10.48550/arxiv.2505.10066

Journal information:
arXiv

Citation:
Dark LLMs: It’s still easy to trick most AI chatbots into providing harmful information, study finds (2025, May 26)
retrieved 26 May 2025
from https://techxplore.com/news/2025-05-dark-llms-easy-ai-chatbots.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

It’s still easy to trick most AI chatbots into providing harmful information, study finds

Related posts

TCL QM8K Review: The Best Mid-Tier TV

A ‘Grand Unified Theory’ of Math Just Got a Little Bit Closer

Afentra Chief Operating Officer (COO) Joins African Energy Week (AEW) 2025 to Discuss Maximizing Output Across Africa’s Mature Assets

Al Ahli Tripoli and APR win again, Basketball Africa League (BAL) playoffs in South Africa set

Al Ahli Tripoli and APR win again, Basketball Africa League (BAL) playoffs in South Africa set

Leave a Reply Cancel reply

RECOMMENDED NEWS

Kentucky’s Bitcoin Boom Has Gone Bust

Minister of Foreign Affairs Nasser Bourita take part in the 44th session of the Executive Council of the African Union ahead of the Union Summit in Addis Ababa

Jeff Bezos’ Miami Neighbors Are Selling Their Land for $200M

Pioneering gay photographer George Platt Lynes is ready for his closeup

POPULAR NEWS

Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

The world’s top 10 most valuable car brands in 2025

When Will SHIB Reach $1? Here’s What ChatGPT Says

Top 10 African countries with the highest GDP per capita in 2025

Global ranking of Top 5 smartphone brands in Q3, 2024