In artificial intelligence, 2025 marked a decisive shift. Systems once confined to research labs and prototypes began to appear as everyday tools. At the center of this transition was the rise of AI agents – systems that can use other software tools and act on their own.
AI agents moved from theory to infrastructure, reshaping how people interact with large language models, the systems that power chatbots such as ChatGPT.
In 2025, the definition of an AI agent shifted from the academic framing of systems that perceive, reason and act to large language models that are capable of using software tools, taking autonomous action and expanding the capacity to use tools and coordinate with other systems to complete tasks independently.
In January, the release of Chinese model DeepSeek-R1 as an open-weight model disrupted assumptions about who could build high-performing large language models. An open-weight model is an AI model whose training, reflected in values called weights, is publicly available.
By mid-2025, “agentic browsers” began to appear. Tools such as Copilot in Microsoft’s Edge reframed the browser as an active participant rather than a passive interface.
At the same time, workflow builders such as Google’s Antigravity lowered the technical barrier for creating custom agent systems beyond what has already happened with coding agents such as GitHub Copilot.
The risks of agents became harder to ignore. In November, Anthropic disclosed how its Claude Code agent had been misused to automate parts of a cyberattack. The incident illustrated a broader concern: By automating repetitive, technical work, AI agents can also lower the barrier for malicious activity.
This tension defined much of 2025. AI agents expanded what individuals and organizations could do, but they also amplified existing vulnerabilities.
Several open questions are likely to shape the next phase of AI agents in 2026.
One is benchmarks. Traditional benchmarks, which are like a structured exam with a series of questions and standardized scoring, work well for single models, but agents are composite systems made up of models, tools, memory and decision logic. Researchers increasingly want to evaluate not just outcomes but processes.
Progress here will be critical for improving reliability and trust, and ensuring that an AI agent will perform the task at hand. One method is establishing clear definitions around AI agents and AI workflows. Organizations will need to map out exactly where AI will integrate into workflows or introduce new ones.
Another development to watch is governance. In late 2025, the Linux Foundation announced the creation of the Agentic AI Foundation, signaling an effort to establish shared standards and best practices. If successful, it could play a role like the World Wide Web Consortium in shaping an open, interoperable agent ecosystem.
There is also a growing debate over model size. While large, general-purpose models dominate headlines, smaller and more-specialized models are often better suited to specific tasks. As agents become configurable consumer and business tools, the power to choose the right model increasingly shifts to users rather than labs or corporations.
Significant socio-technical challenges remain. Expanding data center infrastructure strains energy grids and affects local communities. In workplaces, agents raise concerns about automation, job displacement and surveillance.
From a security perspective, connecting models to tools and stacking agents together multiplies risks. Regulation is another unresolved issue. The U.S. has relatively limited oversight of algorithmic systems.
Meeting these challenges will require more than technical breakthroughs. It demands rigorous engineering practices, careful design and clear documentation of how systems work and fail.
Thomas Serban von Davier is an affiliated faculty member at the Carnegie Mellon Institute for Strategy and Technology. Distributed by The Conversation and The Associated Press.