Friday, June 20, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

OpenAI Demos a Control Method for Superintelligent AI

Simon Osuji by Simon Osuji
December 14, 2023
in Artificial Intelligence
0
OpenAI Demos a Control Method for Superintelligent AI
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter



One day, the theory goes, we humans will create AI systems that outmatch us intellectually. That could be great if they solve problems that we’ve been thus far unable to crack (think cancer or climate change), or really bad if they begin to act in ways that are not in humanity’s best interests, and we’re not smart enough to stop them.

So earlier this year, OpenAI launched its superalignment program, an ambitious attempt to find technical means to control a superintelligent AI system, or “align” it with human goals. OpenAI is devoting 20 percent of its compute to this effort, and hopes to have solutions by 2027.

The biggest challenge for this project: “This is a future problem about future models that we don’t even know how to design, and certainly don’t have access to,” says Collin Burns, a member of OpenAI’s superalignment team. “This makes it very tricky to study—but I think we also have no choice.”

The first preprint paper to come out from the superalignment team showcases one way the researchers tried to get around that constraint. They used an analogy: Instead of seeing whether a human could adequately supervise a superintelligent AI, they tested a weak AI model’s ability to supervise a strong one. In this case, GPT-2 was tasked with supervising the vastly more powerful GPT-4. Just how much more powerful is GPT-4? While GPT-2 has 1.5 billion parameters, GPT-4 is rumored to have 1.76 trillion parameters (OpenAI has never released the figures for the more powerful model).

It’s an interesting approach, says Jacob Hilton of the Alignment Research Center; he was not involved with the current research, but is a former OpenAI employee. “It has been a long-standing challenge to develop good empirical testbeds for the problem of aligning the behavior of superhuman AI systems,” he tells IEEE Spectrum. “This paper makes a promising step in that direction and I am excited to see where it leads.”

“This is a future problem about future models that we don’t even know how to design, and certainly don’t have access to.” —Collin Burns, OpenAI

The OpenAI team gave the GPT pair three types of tasks: chess puzzles, a set of natural language processing (NLP) benchmarks such as commonsense reasoning, and questions based on a dataset of ChatGPT responses, where the task was predicting which of multiple responses would be preferred by human users. In each case, GPT-2 was trained specifically on these tasks—but since it’s not a very large or capable model, it didn’t perform particularly well on them. Then its training was transferred over to a version of GPT-4 with only basic training and no fine-tuning for these specific tasks. But remember: GPT-4 with only basic training is still a much more capable model than GPT-2.

The researchers wondered whether GPT-4 would make the same mistakes as its supervisor, GPT-2, which had essentially given it instructions for how to do the tasks. Remarkably, the stronger model consistently outperformed its weak supervisor. The strong model did particularly well on the NLP tasks, achieving a level of accuracy comparable to GPT-3.5. Its results were less impressive with the other two tasks, but they were “signs of life” to encourage the group to keep trying with these tasks, says Leopold Aschenbrenner, another researcher on the superalignment team.

The researchers call this phenomenon weak-to-strong generalization; they say it shows that the strong model had implicit knowledge of how to perform the tasks, and could find that knowledge within itself even when given shoddy instructions.

In this first experiment, the approach worked best with the NLP tasks because they’re fairly simple tasks with clear right and wrong answers, the team says. It did worst with the tasks from the ChatGPT database, in which it was asked to determine which responses humans would prefer, because the answers were less clear cut. “Some were subtly better, some were subtly worse,” says Aschenbrenner.

Could this alignment technique scale to superintelligent AI?

Burns gives an example of how a similar situation might play out in a future with superintelligent AI. “If you ask it to code something, and it generates a million lines of extremely complicated code interacting in totally new ways that are qualitatively different from how humans program, you might not be able to tell: Is this doing what we ask it to do?” Humans might also give it a corollary instruction, such as: Don’t cause catastrophic harm in the course of your coding work. If the model has benefitted from weak-to-strong generalization, it might understand what it means to cause catastrophic harm and see—better than its human supervisors can—whether its work is straying into dangerous territory.

“We can only supervise simple examples that we can understand,” Burns says. “We need [the model] to generalize to much harder examples that superhuman models themselves understand. We need to elicit that understanding of: ‘is it safe or not, does following instructions count,’ which we can’t directly supervise.”

Some might argue that these results are actually a bad sign for superalignment, because the stronger model deliberately ignored the (erroneous) instructions given to it and pursued its own agenda of getting the right answers. But Burns says that humanity doesn’t want a superintelligent AI that follows incorrect instructions. What’s more, he says, “in practice many of the errors of the weak supervisor will be more of the form: ‘this problem is way too hard for me, and I don’t have a strong opinion either way.’” In that case, he says, we’ll want a superintelligence that can figure out the right answers for us.

To encourage other researchers to chip away at such problems, OpenAI announced today that it’s offering US $10 million in grants for work on a wide variety of alignment approaches. “Historically, alignment has been more theoretical,” says Pavel Izmailov, another member of the superalignment team. “I think this is work that’s available to academics, grad students, and the machine learning community.” Some of the grants are tailored for grad students and offer both a $75,000 stipend and a $75,000 compute budget.

Burns adds: “We’re very excited about this, because I think for the first time we really have a setting where we can study this problem of aligning future superhuman models.” It may be a future problem, he says, but they can “make iterative empirical progress today.”

From Your Site Articles

Related Articles Around the Web



Source link

Related posts

US Supreme Court Upholds Tennessee’s Ban on Gender-Affirming Care for Minors

US Supreme Court Upholds Tennessee’s Ban on Gender-Affirming Care for Minors

June 20, 2025
New test can help driverless cars make ‘moral’ decisions

New test can help driverless cars make ‘moral’ decisions

June 20, 2025
Previous Post

Seed-stage startups — and their investors — react to higher hurdles for Series A funding

Next Post

Factory inflation slows more than expected on fuel

Next Post
Factory inflation slows more than expected on fuel

Factory inflation slows more than expected on fuel

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

U.S. Lifts Human Rights Violation Designation on Ethiopia

U.S. Lifts Human Rights Violation Designation on Ethiopia

2 years ago
US Army Picks 3dB Labs to Develop Situational Awareness System

US Army Picks 3dB Labs to Develop Situational Awareness System

2 months ago
BRICS Currency Faces Hurdles a Month Before Summit

BRICS Currency Faces Hurdles a Month Before Summit

2 years ago
Tribute to Esther Gitau, Woman Killed by her Husband in Texas

Tribute to Esther Gitau, Woman Killed by her Husband in Texas

3 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.