Sunday, May 18, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

A new metric to quantify capabilities of AI systems in terms of human capabilities

Simon Osuji by Simon Osuji
March 20, 2025
in Artificial Intelligence
0
A new metric to quantify capabilities of AI systems in terms of human capabilities
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


A new metric to quantify capabilities of AI systems in terms of human capabilities
Our methodology for measuring AI agent time horizon. Credit: arXiv (2025). DOI: 10.48550/arxiv.2503.14499

A team of AI researchers at startup METR is proposing a new metric to quantify the capabilities of AI systems in terms of human capabilities. They have published a paper on the arXiv preprint server describing the new metric, which they call “task-completion time horizon” (TCTH).

Related posts

Coinbase Will Reimburse Customers Up to $400 Million After Data Breach

Coinbase Will Reimburse Customers Up to $400 Million After Data Breach

May 17, 2025
Is Elon Musk Really Stepping Back from DOGE?

Is Elon Musk Really Stepping Back from DOGE?

May 17, 2025

LLMs such as GPT-2 are getting better at producing reliable results with each new iteration. In this new study, the team in California noted that such models are still being described in ways that are not up to the task of fully describing a system’s capabilities. Because of that, they have come up with a metric to quantify the capabilities in ways that can be used across multiple fields, such as writing computer programs or generating the steps needed to carry out a task.

With TCTH, tasks can be quantified by testing them against humans. As one example, the researchers found that early versions of LLMs failed to complete any of a certain group of tasks given to human experts, who could get them done in one minute. In sharp contrast, the latest version of Claude 3.7 Sonnet can complete 50% of certain tasks that took humans on average 59 minutes to achieve.

A new metric to quantify capabilities of AI systems in terms of human capabilities
The length of tasks (measured by how long they take human professionals) that generalist autonomous frontier model agents can complete with 50% reliability has been doubling approximately every 7 months for the last 6 years. Credit: arXiv (2025). DOI: 10.48550/arxiv.2503.14499

By setting up a list of tasks and then seeing how long it takes a human to achieve them, the new metric could be used to develop a benchmark to measure how well AI models are stacking up. And such benchmarks, they suggest, should be based on a 50% success rate, because it has thus far been shown to be the most robust when used in data distribution analysis.

As part of their work with the new metric, the research team found that AI models are improving dramatically on completing long tasks, such as programming, carrying out cybersecurity assignments, general reasoning assignments and machine learning. Such progress suggests that they could soon be used to carry out major assignments like chemical discovery or even whole engineering projects.

More information:
Thomas Kwa et al, Measuring AI Ability to Complete Long Tasks, arXiv (2025). DOI: 10.48550/arxiv.2503.14499

Journal information:
arXiv

© 2025 Science X Network

Citation:
A new metric to quantify capabilities of AI systems in terms of human capabilities (2025, March 20)
retrieved 20 March 2025
from https://techxplore.com/news/2025-03-metric-quantify-capabilities-ai-terms.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

10 African countries where women have the most economic opportunities

Next Post

Thales to Equip Dutch Orka-Class Submarines With Advanced Sonar Systems

Next Post
Thales to Equip Dutch Orka-Class Submarines With Advanced Sonar Systems

Thales to Equip Dutch Orka-Class Submarines With Advanced Sonar Systems

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Can ChatGPT co-author your study? (No, but it may help with the research)

Can ChatGPT co-author your study? (No, but it may help with the research)

2 years ago
Soybeans dip after Chinese demand helps push prices to 11-week high

Soybeans dip after Chinese demand helps push prices to 11-week high

2 years ago
Gallery Rosenfeld Unveils Ndidi Emefiele’s Fourth Solo Exhibition

Gallery Rosenfeld Unveils Ndidi Emefiele’s Fourth Solo Exhibition

2 years ago
Virginia Tech engages in three collaborative bioscience projects to address unmet health care needs

Virginia Tech engages in three collaborative bioscience projects to address unmet health care needs

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.