A team of software engineers, AI specialists and programmers at Tsinghua University, working with TikTok parent company ByteDance, has announced the development of a graphical user interface (GUI) agent model called UI-TARS. The group announced its development and introduction to the world at large in a paper posted to the arXiv preprint server.
Over the past decade, AI applications have flourished. Some of the most well-known are LLMs such as ChatGPT. But others have been under development to serve a variety of purposes. One application is assisting computer users in carrying out mundane tasks, such as sourcing the cheapest airline fare for a flight between two cities and then buying tickets for it. Such tasks typically involve time-consuming web browsing.
AI researchers have suggested that such tasks could be automated by smart agents. In this new study, the team in China has done just that with the development of UI-TARS—a GUI agent model that can be used locally on a personal computer or via the cloud on other devices.
The model was trained using 50 billion tokens that represented characteristics of a GUI (via screenshots), such as those found on traditional web pages. Training also involved reflection tuning, which meant the model was programmed to learn from mistakes and then to adapt, modifying how it approached different or unknown situations.
When running UI-TARS, a user is presented with two tabs—one shows the “thinking process” that the app is undergoing as it goes about its overall task. The other tab shows the websites, files or other GUIs that the app is working with. Thus, if it was used to book a flight, a user could see the airline websites being viewed and could then switch over to see what the app was doing with them.
At the end of the process, the user is presented with the final web page prompting confirmation of ticket purchase. In testing their model, the team found that it outperformed other AI models such as GPT-4o, or Gemini-2.0.
More information:
Yujia Qin et al, UI-TARS: Pioneering Automated GUI Interaction with Native Agents, arXiv (2025). DOI: 10.48550/arxiv.2501.12326
UI-TARS: github.com/bytedance/UI-TARS
arXiv
© 2025 Science X Network
Citation:
UI-TARS GUI agent model can automate tasks such as finding and booking airline tickets (2025, January 23)
retrieved 23 January 2025
from https://techxplore.com/news/2025-01-gui-agent-automate-tasks-airline.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.