• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

New platform helps evaluate AI for complex computer use

Simon Osuji by Simon Osuji
February 21, 2025
in Artificial Intelligence
0
New platform helps evaluate AI for complex computer use
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


robot using computer
Credit: Pixabay/CC0 Public Domain

Imagine asking AI to plan your trip itinerary, book and pay for all your flights, and arrange your airport transport—all within a single click. Fortunately, an international research team is making this vision a reality.

Related posts

Best Electric Toothbrush, Backed by Real-Life Testing (2026)

Best Electric Toothbrush, Backed by Real-Life Testing (2026)

February 22, 2026
What to Know About At-Home STI Tests: Pros, Cons, and Recommendations (2026)

What to Know About At-Home STI Tests: Pros, Cons, and Recommendations (2026)

February 22, 2026

The team, composed of researchers from the University of Waterloo, University of Hong Kong, Salesforce Research and Carnegie Mellon University developed Computer Agent Arena—an evaluation platform that can enhance and create computer agents.

A computer agent is a type of software that can perform tasks on behalf of a person or organization, without needing constant human intervention. It can interpret the state of the computer and act autonomously to help users solve problems. Examples of computer agents include voice assistants like Siri and Alexa, who can help users send messages and schedule meetings.

AI-based computer agents struggle with performing complex computer tasks because it requires controlling multiple computer applications and various steps. For example, filing an expense report may be difficult because it requires updating a spreadsheet by searching multiple emails and folders filled with bank statements and receipts.

Computer Agent Arena is the first interactive computer use evaluation platform that focuses on performing diverse tasks across multiple applications. This work is an extension of the researchers’ work on OSWorld, the world’s first scalable and real computer environment for multimodal agents.






Credit: University of Waterloo

“Computer Agent Arena provides a platform for the research community to develop effective and efficient agents that generalize to real-world computer usage,” says co-developer Dr. Victor Zhong, assistant professor at the Cheriton School of Computer Science. Like other Waterloo researchers, he is investigating human-technology interactions, exploring how to mitigate everyday problems by creating novel technologies.

“Computer Agent Arena is distinct from similar research like Mind2Web and WebArena because it provides unified application programming interfaces for comprehensive observations and actions in an executable environment with multiple applications.”

Through Computer Agent Arena, users can assess and compare various computer agents based on large language models (LLM) and vision language models. First, users select an operating system such as Windows, and applications like Google Chrome and Excel. Users can then prompt the computer agent with a task, which will be performed simultaneously by two AI models in real-time. After completion, users can rate each model’s performance and provide feedback.

Ultimately, the team seeks to provide a diverse and dynamic platform for building and evaluating agents that can perform real-world computer tasks as safely, effectively and efficiently as humans do.

“Our current findings show that foundation models such as GPT4 and Claude are far from being able to act safely and effectively as assistant computer agents,” Zhong says. “Computer Agent Arena provides a timely testbed to develop the next generation of AI agents.”

Provided by
University of Waterloo

Citation:
New platform helps evaluate AI for complex computer use (2025, February 20)
retrieved 20 February 2025
from https://techxplore.com/news/2025-02-platform-ai-complex.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

South Africa’s efforts to manage immigration and combat illegal entry

Next Post

Beta Technologies’ bet on electric flight and Hyundai’s new Tesla charging port comes up short

Next Post
Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

Beta Technologies’ bet on electric flight and Hyundai’s new Tesla charging port comes up short

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Imagine If Joe Biden’s AI Executive Order Were Inspired by ‘The Terminator’

Imagine If Joe Biden’s AI Executive Order Were Inspired by ‘The Terminator’

2 years ago
A Las Vegas-based airline and one of Raiders’ principal sponsors is offering an unforgettable weekend to the Chiefs’ fans

A Las Vegas-based airline and one of Raiders’ principal sponsors is offering an unforgettable weekend to the Chiefs’ fans

3 years ago
US urges Kenya to tighten rules on wildlife trafficking

US urges Kenya to tighten rules on wildlife trafficking

2 years ago
Response To Critics Of Lee & Broudy (2024) On The Toxicity And Self-Assembling Technology In Incubated Samples Of Injectable mRNA Materials

Response To Critics Of Lee & Broudy (2024) On The Toxicity And Self-Assembling Technology In Incubated Samples Of Injectable mRNA Materials

1 year ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.