• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

Retraining AI to fortify itself against rogue rewiring even after key layers are removed

Simon Osuji by Simon Osuji
September 5, 2025
in Artificial Intelligence
0
Retraining AI to fortify itself against rogue rewiring even after key layers are removed
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


Researchers fortify AI against rogue rewiring
(A) We investigate early exits from different image encoder layers and find that VLM safety alignment varies, leading to what we term Image Encoder Early Exit (ICET) vulnerability. We propose Layer-wise Clip-PPO (L-PPO) to alleviate ICET. (B) With the same input (image and prompt), choosing different image encoder layers significantly affects the safety of the output response. (C) Safety training is applied with the model’s default settings and architecture, but limited generalization creates vulnerabilities, leaving parts of the embedding space uncovered when architectural changes occur (e.g., using a different intermediate layer embedding than during training). Credit: arXiv (2024). DOI: 10.48550/arxiv.2411.04291

As generative AI models move from massive cloud servers to phones and cars, they’re stripped down to save power. But what gets trimmed can include the technology that stops them from spewing hate speech or offering roadmaps for criminal activity.

Related posts

Trump Imposes New Tariffs to Sidestep Supreme Court Ruling

Trump Imposes New Tariffs to Sidestep Supreme Court Ruling

February 21, 2026
The Supreme Court’s Tariff Ruling Won’t Bring Car Prices Back to Earth

The Supreme Court’s Tariff Ruling Won’t Bring Car Prices Back to Earth

February 21, 2026

To counter this threat, researchers at the University of California, Riverside, have developed a method to preserve AI safeguards even when open-source AI models are stripped down to run on lower-power devices. Their work is published on the arXiv preprint server.

Unlike proprietary AI systems, open‑source models can be downloaded, modified, and run offline by anyone. Their accessibility promotes innovation and transparency but also creates challenges when it comes to oversight. Without the cloud infrastructure and constant monitoring available to closed systems, these models are vulnerable to misuse.

The UCR researchers focused on a key issue: carefully designed safety features erode when open-source AI models are reduced in size. This happens because lower‑power deployments often skip internal processing layers to conserve memory and computational power. Dropping layers improves the models’ speed and efficiency, but could also result in answers containing pornography, or detailed instructions for making weapons.

“Some of the skipped layers turn out to be essential for preventing unsafe outputs,” said Amit Roy-Chowdhury, professor of electrical and computer engineering and senior author of the study. “If you leave them out, the model may start answering questions it shouldn’t.”

The team’s solution was to retrain the model’s internal structure so that its ability to detect and block dangerous prompts is preserved, even when key layers are removed. Their approach avoids external filters or software patches. Instead, it changes how the model understands risky content at a fundamental level.

“Our goal was to make sure the model doesn’t forget how to behave safely when it’s been slimmed down,” said Saketh Bachu, UCR graduate student and co-lead author of the study.

To test their method, the researchers used LLaVA 1.5, a vision‑language model capable of processing both text and images. They found that certain combinations, such as pairing a harmless image with a malicious question, could bypass the model’s safety filters. In one instance, the altered model responded with detailed instructions for building a bomb.

After retraining, however, the model reliably refused to answer dangerous queries, even when deployed with only a fraction of its original architecture.

“This isn’t about adding filters or external guardrails,” Bachu said. “We’re changing the model’s internal understanding, so it’s on good behavior by default, even when it’s been modified.”

Bachu and co-lead author Erfan Shayegani, also a graduate student, describe the work as “benevolent hacking,” a way of fortifying models before vulnerabilities can be exploited. Their ultimate goal is to develop techniques that ensure safety across every internal layer, making AI more robust in real‑world conditions.

In addition to Roy-Chowdhury, Bachu, and Shayegani, the research team included doctoral students Arindam Dutta, Rohit Lal, and Trishna Chakraborty, and UCR faculty members Chengyu Song, Yue Dong, and Nael Abu-Ghazaleh. Their work was presented this year at the International Conference on Machine Learning in Vancouver, Canada.

“There’s still more work to do,” Roy-Chowdhury said. “But this is a concrete step toward developing AI in a way that’s both open and responsible.”

More information:
Saketh Bachu et al, Layer-wise Alignment: Examining Safety Alignment Across Image Encoder Layers in Vision Language Models, arXiv (2024). DOI: 10.48550/arxiv.2411.04291

Journal information:
arXiv

Provided by
University of California – Riverside

Citation:
Retraining AI to fortify itself against rogue rewiring even after key layers are removed (2025, September 5)
retrieved 5 September 2025
from https://techxplore.com/news/2025-09-retraining-ai-fortify-rogue-rewiring.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

NERC Says Only 704,801 electricity consumers metered in S’East

Next Post

Builders And Investors At 12th Angel Fair Africa

Next Post
Builders And Investors At 12th Angel Fair Africa

Builders And Investors At 12th Angel Fair Africa

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

At Senate hearing, lawmakers express dissatisfaction with RFK Jr.’s vaccine moves

At Senate hearing, lawmakers express dissatisfaction with RFK Jr.’s vaccine moves

6 months ago
Amnesty International accuses authorities of failure amid targeted killings of AbM members

Amnesty International accuses authorities of failure amid targeted killings of AbM members

2 years ago
Niger Delta varsity building department inducts pioneer set into NIOB – EnviroNews

Niger Delta varsity building department inducts pioneer set into NIOB – EnviroNews

1 month ago
Dogecoin Bulls Eye 100% Surge, Price Eyes Critical Resistance

Dogecoin Bulls Eye 100% Surge, Price Eyes Critical Resistance

3 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.