Friday, May 16, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

AI Agents Take Control: Exploring Computer-Use Agents

Simon Osuji by Simon Osuji
February 13, 2025
in Artificial Intelligence
0
AI Agents Take Control: Exploring Computer-Use Agents
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter



Two years after the generative AI boom really began with the launch of ChatGPT, it no longer seems that exciting to have a phenomenally helpful AI assistant hanging around in your web browser or phone, just waiting for you to ask it questions. The next big push in AI is for AI agents that can take action on your behalf. But while agentic AI has already arrived for power users like coders, everyday consumers don’t yet have these kinds of AI assistants.

That will soon change. Anthropic, Google DeepMind, and OpenAI have all recently unveiled experimental models that can use computers the way people do—searching the web for information, filling out forms, and clicking buttons. With a little guidance from the human user, they can do thinks like order groceries, call an Uber, hunt for the best price for a product, or find a flight for your next vacation. And while these early models have limited abilities and aren’t yet widely available, they show the direction that AI is going.

Related posts

Can the US really enforce a global AI chip ban?

Can the US really enforce a global AI chip ban?

May 16, 2025
The Best Ergonomic Mouse (2025), Tested and Reviewed

The Best Ergonomic Mouse (2025), Tested and Reviewed

May 16, 2025

“This is just the AI clicking around,” said OpenAI CEO Sam Altman in a demo video as he watched the OpenAI agent, called Operator, navigate to OpenTable, look up a San Francisco restaurant, and check for a table for two at 7pm.

Zachary Lipton, an associate professor of machine learning at Carnegie Mellon University, notes that AI agents are already being embedded in specialized software for different types of enterprise customers such as salespeople, doctors, and lawyers. But until now, we haven’t seen AI agents that can “do routine stuff on your laptop,” he says. “What’s intriguing here is the possibility of people starting to hand over the keys.”

AI Agents from Anthropic, Google DeepMind, and OpenAI

Anthropic was the first to unveil this new functionality, with an announcement in October that its Claude chatbot can now “use computers the way humans do.” The company stressed that it was giving the models this capability as a public beta test, and that it’s only available to developers who are building tools and products on top of Anthropic’s large language models. Claude navigates by viewing screenshots of what the user sees and counting the pixels required to move the cursor to a certain spot for a click. A spokesperson for Anthropic says that Claude can do this work on any computer and within any desktop application.

Next out of the gate was Google DeepMind with its Project Mariner, built on top of Google’s Gemini 2 language model. The company showed Mariner off in December but called it an “early research prototype” and said it’s only making the tool available to “trusted testers” for now. As another precaution, Mariner currently only operates within the Chrome browser, and only within an active tab, meaning that it won’t run in the background while you work on other tasks. While this requirement seems to somewhat defeat the purpose of having a time-saving AI helper, it’s likely just a temporary condition for this early stage of development.

Finally, in January OpenAI launched its computer-use agent (CUA), called Operator. OpenAI called it a “research preview” and made it available only to users who pay US $200 per month for OpenAI’s premium service, though the company said it’s working toward broader release. Yash Kumar, an engineer on the Operator team, says the tool can work with essentially any website. “We’re starting with the browser because this is where the majority of work happens,” Kumar says. But he notes that “the CUA model is also trained to use a computer, so it’s possible we could expand it” to work with other desktop apps.

Like the others, Operator relies on chain-of-thought reasoning to take instructions and break them down into a series of tasks that it can complete. If it needs more information to complete a task—like, for example, if you prefer to buy red or yellow onions—it will pause and ask for input. It also asks for confirmation before taking a final step, like booking the restaurant table or putting in the grocery order.

Safety Concerns for Computer-Use Agents

Here are some things that computer-use agents can’t yet do: log in to sites, agree to terms of service, solve captchas, and enter credit card or other payment details. If an agent comes up against one of these roadblocks, it hands the steering wheel back to the human user. OpenAI notes that Operator doesn’t take screenshots of the browser while the user is entering login or payment information.

The three companies have all noted that putting an AI in charge of your computer could pose safety risks. Anthropic has specifically raised the concern of prompt injection attacks, or ways in which malicious actors can add something to the user’s prompt to make the model take an unexpected action. “Since Claude can interpret screenshots from computers connected to the internet, it’s possible that it may be exposed to content that includes prompt injection attacks,” Anthropic wrote in a blog post.

CMU’s Lipton says that the companies haven’t revealed much information about the computer-use agents and how they work, so it’s hard to assess the risks. “If someone is getting your computer operator to do something nefarious, does that mean they already have access to your computer?” he wonders, and if so, why wouldn’t the miscreant just take action directly?

Still, Lipton says, with all the actions we take and purchases we make online, “It doesn’t require a wild leap of imagination to imagine actions that would leave the user in a pickle.” For example, he says, “Who will be the first person who wakes up and says, ‘My [agent] bought me a fleet of cars?’”

The Future of Computer-Use Agents

While none of the companies have revealed a timeline for making their computer-use agents broadly available, it seems likely that consumers will begin to get access to them this year—either through the big AI companies or through startups creating cheaper knockoffs.

OpenAI’s Kumar says it’s an exciting time, and that Operator marks a step toward a more collaborative future for humans and AI. “It’s a stepping stone on our path to AGI,” he says, referring to the long-promised dream/nightmare of artificial general intelligence. “The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks.”

If you remember the prescient 2013 movie Her, it seems like we’re edging toward the world that existed at the beginning of the film, before the sultry-voiced Samantha began speaking into the protagonist’s ear. It’s a world in which everyone has a boring and neutral AI to help them read and respond to messages and take care of other mundane tasks. Once the AI companies solidly achieve that goal, they’ll no doubt start working on Samantha.

From Your Site Articles

Related Articles Around the Web



Source link

Previous Post

How Trump could productively reshape the transatlantic defense relationship

Next Post

Africa Energy Bank: Assessment team takes final inauguration step – EnviroNews

Next Post
Africa Energy Bank: Assessment team takes final inauguration step – EnviroNews

Africa Energy Bank: Assessment team takes final inauguration step - EnviroNews

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

10 months ago
Norway Buys More TPY-4 Multi-Mission Radars From Lockheed

Norway Buys More TPY-4 Multi-Mission Radars From Lockheed

8 months ago
Elon Musk Rejected OpenAI Cryptocurrency Proposal in 2018

Elon Musk Rejected OpenAI Cryptocurrency Proposal in 2018

6 months ago
Scottish wind turbine company agrees exclusive deal with Ineos –

Scottish wind turbine company agrees exclusive deal with Ineos –

11 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.