• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

A simple technique to defend ChatGPT against jailbreak attacks

Simon Osuji by Simon Osuji
January 18, 2024
in Artificial Intelligence
0
A simple technique to defend ChatGPT against jailbreak attacks
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


A simple technique to defend ChatGPT against jailbreak attacks
Example of a jailbreak attack and the team’s proposed system-mode self-reminder. Credit: Nature Machine Intelligence (2023). DOI: 10.1038/s42256-023-00765-8.

Large language models (LLMs), deep learning-based models trained to generate, summarize, translate and process written texts, have gained significant attention after the release of Open AI’s conversational platform ChatGPT. While ChatGPT and similar platforms are now widely used for a wide range of applications, they could be vulnerable to a specific type of cyberattack producing biased, unreliable or even offensive responses.

Related posts

7 Best Prepaid Phone Plans (2026)

7 Best Prepaid Phone Plans (2026)

February 1, 2026
Building a Watch Collection on a Budget? Here’s Where to Start (2026)

Building a Watch Collection on a Budget? Here’s Where to Start (2026)

February 1, 2026

Researchers at Hong Kong University of Science and Technology, University of Science and Technology of China, Tsinghua University and Microsoft Research Asia recently carried out a study investigating the potential impact of these attacks and techniques that could protect models against them. Their paper, published in Nature Machine Intelligence, introduces a new psychology-inspired technique that could help to protect ChatGPT and similar LLM-based conversational platforms from cyberattacks.

“ChatGPT is a societally impactful artificial intelligence tool with millions of users and integration into products such as Bing,” Yueqi Xie, Jingwei Yi and their colleagues write in their paper. “However, the emergence of jailbreak attacks notably threatens its responsible and secure use. Jailbreak attacks use adversarial prompts to bypass ChatGPT’s ethics safeguards and engender harmful responses.”

The primary objective of the recent work by Xie, Yi and their colleagues was to highlight the impact that jailbreak attacks can have on ChatGPT and introduce viable defense strategies against these attacks. Jailbreak attacks essentially exploit the vulnerabilities of LLMs to bypass constraints set by developers and elicit model responses that would typically be restricted.

“This paper investigates the severe yet under-explored problems created by jailbreaks as well as potential defensive techniques,” Xie, Yi and their colleagues explain in their paper. “We introduce a jailbreak dataset with various types of jailbreak prompts and malicious instructions.”

The researchers first compiled a dataset including 580 examples of jailbreak prompts designed to bypass restrictions that prevent ChatGPT from providing answers deemed “immoral.” This includes unreliable texts that could fuel misinformation as well as toxic or abusive content.

When they tested ChatGPT on these jailbreak prompts, they found that it often fell into their “trap,” producing the malicious and unethical content they requested. Xie, Yi and their colleagues then set out to devise a simple and yet effective technique that could protect ChatGPT against carefully tailored jailbreak attacks.

The technique they created draws inspiration from the psychological concept of self-reminders, nudges that can help people to remember tasks that they need to complete, events they are supposed to attend, and so on. The researchers’ defense approach, called system-mode self-reminder, is similarly designed to remind Chat-GPT that the answers it provides should follow specific guidelines.

“This technique encapsulates the user’s query in a system prompt that reminds ChatGPT to respond responsibly,” the researchers write. “Experimental results demonstrate that self-reminders significantly reduce the success rate of jailbreak attacks against ChatGPT from 67.21% to 19.34%.”

So far, the researchers tested the effectiveness of their technique using the dataset they created and found that it achieved promising results, reducing the success rate of attacks, although not preventing all of them. In the future, this new technique could be improved further to reduce the vulnerability of LLMs to these attacks, while also potentially inspiring the development of other similar defense strategies.

“Our work systematically documents the threats posed by jailbreak attacks, introduces and analyses a dataset for evaluating defensive interventions and proposes the psychologically inspired self-reminder technique that can efficiently and effectively mitigate against jailbreaks without further training,” the researchers summarize in their paper.

More information:
Yueqi Xie et al, Defending ChatGPT against jailbreak attack via self-reminders, Nature Machine Intelligence (2023). DOI: 10.1038/s42256-023-00765-8.

© 2024 Science X Network

Citation:
A simple technique to defend ChatGPT against jailbreak attacks (2024, January 18)
retrieved 18 January 2024
from https://techxplore.com/news/2024-01-simple-technique-defend-chatgpt-jailbreak.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

PGS Expands Seismic Survey in Egypt’s West Mediterranean

Next Post

Belarus adds nukes to its military doctrine

Next Post
Belarus adds nukes to its military doctrine

Belarus adds nukes to its military doctrine

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Jim Cramer Calls Paramount a ‘Meme Stock’, It Rises 36%

Jim Cramer Calls Paramount a ‘Meme Stock’, It Rises 36%

6 months ago
Kenya’s Ice Hockey Ambitions – Africa.com

Kenya’s Ice Hockey Ambitions – Africa.com

12 months ago
Gensler Exec Diane Hoskins: Multiple Career Changes Pays Off

Gensler Exec Diane Hoskins: Multiple Career Changes Pays Off

2 years ago
Desperate to Escape, Palestinians Must Pay Exorbitant Fees to Enter Egypt

Desperate to Escape, Palestinians Must Pay Exorbitant Fees to Enter Egypt

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.