ColorBrowserAgent: An Intelligent GUI Agent for Complex Long-Horizon Web Automation
By: Jiamu Zhou , Jihong Wang , Weiming Zhang and more
The web browser serves as a primary interface for daily human activities, making its automation a critical frontier for Human-Centred AI. While Large Language Models (LLMs) have enabled autonomous agents to interact with web GUIs, their reliability in real-world scenarios is hampered by long-horizon instability and the vast heterogeneity of site designs. In this paper, we introduce ColorBrowserAgent, a framework designed for Collaborative Autonomy in complex web tasks. Our approach integrates two human-centred mechanisms: (1) Progressive Progress Summarization, which mimics human short-term memory to maintain coherence over extended interactions; and (2) Human-in-the-Loop Knowledge Adaptation, which bridges the knowledge gap in diverse environments by soliciting expert intervention only when necessary. This symbiotic design allows the agent to learn from human tips without extensive retraining, effectively combining the scalability of AI with the adaptability of human cognition. Evaluated on the WebArena benchmark using GPT-5, ColorBrowserAgent achieves a state-of-the-art success rate of 71.2\%, demonstrating the efficacy of interactive human assistance in robust web automation.
Similar Papers
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
Multiagent Systems
Lets computers help you with tasks automatically.
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
Multiagent Systems
Helps computers do tasks for you.
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
Computation and Language
Helps computers learn by "browsing" websites like people.