State of AI Agents: Not autonomous (yet)
Last week I attended the IA 2024 Summit which witnessed a lot of excitement around AI Agents, amongst other AI topics. Agents are an area that I am very excited about, albeit somewhat tempered. I think we are in the very early days to attain the true potential of AI Agents: truly autonomous AI applications that can complete tasks with no human intervention.
Agents that can autonomously handle tasks. These agents can discover the necessary context and are able to independently “reason” to accomplish the given task.
One of the best illustrations of what AI Agents are and how they work is shown in the illustration below . These autonomous agents will, given user input (the task), will ”if it is determined that an action is required, that action is then taken, and an observation (action result) is made. That action & corresponding observation are added back to the prompt (we call this an “agent scratchpad”), and the loop resets, ie. the LLM is called again (with the updated agent scratchpad).” source: langchain
While this is the desired end state, we’re clearly not “there” yet. One way to look at the current state of agents, is to view it through the categorization of Yohei Nakajima who lays out the following categories of agents:
Hand-Crafted Agents: Chains of prompts and API calls that are autonomous but operate within narrow constraints.
Specialized Agents: Dynamically decide what to do within a subset of task types and tools. Constrained, but less so than hand-crafted agents are.
General Agents: The AGI of Agents – still at the horizon versus practical reality today.
Our goal is ultimately to attain General Agents - these autonomous AI powered applications that can solve complex tasks with no human intervention. However, we are presently in the early phases of developing these truly cognitive agents. Many current AI systems resemble what Yohei calls "hand-crafted agents" — highly specialized tools designed for specific tasks.
While this might seem like a limitation, it is, in my opinion, a crucial step towards achieving more sophisticated AI capabilities. The development of these specialized agents lays the groundwork for the autonomous agents of tomorrow, anticipating the emergence of a common architectures, patterns, tools and an “AI OS” that will underpin more advanced AI operations and abstract many of the repetitive tasks of building and maintaining AI Agents. This is not dissimilar to how patterns, libraries, language and operating systems have emerged to support building the software that we use on a daily basis.
I also believe that this gradual approach, starting with simple hand-crafted agents and inching towards the truly autonomous ones is important for reasons that have nothing to do with technology or AI and that is: human distrust of autonomous AI agents.
The notion of fully autonomous agents raises concerns about trust and the potential displacement of human jobs. A more practical approach in the short-term is what I referred to earlier as a Centaur-like model: The human and the AI working collaboratively, with the AI augmenting and not replacing the human. Why do you think Microsoft calls it “CoPilot”?
Cast study: Cancer detection agents
A few years ago I worked at a company - Kheiron Medical Tech - developing AI products for cancer detection. The first version of these models was focused on reading mammograms to detect breast cancer. One of the main challenges when rolling out a product like that into the market is the fear of displacing human radiologists (who are the buyers of these product) combined with the fear of patients reacting to an AI diagnosing them.
These fears are in tension with the tremendous benefits of medical imagery AI models. These models don’t tire. These models can be trained on tens of millions of medical images, orders of magnitude more than a radiologist will read in their life-time.
The answer to solve for both the fear and uncertainty of adopting AI products, yet also offering the tremendous benefits of AI is to have these models work alongside a human radiologist and not by displacing them. There are several advantages to this approach.
First, the productivity value can be easily measured under this approach. Most countries (the US being a notable exception) apply an independent double-reader regime. Under this approach a mammogram is read by two radiologists independently. If they disagree on the reading output, a third arbitrator is introduced to break the tie. An AI model can be one of these independent readers, which results in anywhere from 50% to 33% productivity gains to the human radiologists, who can now do other tasks outside of reading mammograms.
Second, by having the human in the loop, patients get a sense of comfort and care from being treated by a human and not a machine.
Third, the introduction of the AI agent working alongside a human in an asynchronous mode doesn’t disrupt the human reader. The AI and human are working independently without being in each other’s way. This is in contrast to some of the more popular applications of AI, notably code completion, that work synchronously with a human and can add a lot of cognitive load.
The issue with code-completion is that its in the way. It’s working in real-time and synchronously with the human. I think a better model is one in which AI works side-by-side and asynchronously with a human. There are many useful tasks that AI can help an human developer with, without being “in the way” and some of these can be immensely productive.
My favorite example, is using AI to write unit tests, or broadly speaking relying on AI to author exhaustive tests for a piece of code. That’s a task that the AI can do offline, or asynchronously, without interrupting the human. When completed, the human can review what the AI authored and submit it.
Are agents a better fit for AI in software development?
·
Mia, the name of the AI breast-cancer detection product, is an example of hand-crafted agent. It’s doesn’t need much human interaction but can only solve one and only one task.
Although we are presently at the stage of CoPilots that do not necessarily replace humans, I have no doubt that we will be able to attain truly autonomous agents in the not so distant future.
The pace of innovation in AI, be it better models or new advancements such as AI21 Labs' MRKL model and the ReAct prompting strategy are critical in shaping the future of AI agents. The MRKL model enhances AI's ability to integrate and process multiple forms of knowledge, while the ReAct strategy improves how AI systems interpret and respond to user prompts, ensuring more effective and nuanced interactions.
Agents aren’t copilots; they are replacements. They do work in place of humans — think call centers and the like, to start — and they have all of the advantages of software: always available, and scalable up-and-down with demand. Source: Stratechery
In the meantime, I think the focus for developers building agents is to focus on centaur-like ones that solve a high value task with measurable productivity gains to the human. I can’t emphasize the importance of showing tangible outcomes or productivity gains when adopting this approach. An agent - like Tesla’s AutoPilot - which requires 100% attention from a human, is neither an agent nor “auto”. That might explain why Tesla had to cut the price of this product. The productivity gains to the human are negligible, although the novelty and wow factor is tremendous.
There are many useful resources on this topic. Below are a few articles and folks I follow on Twitter:
Dr. Eric Topol Eric writes a lot about the intersection of tech/AI and medicine, or Deep Medicine as he calls it.
The Rise of AI Agent Infrastructure. Good article with a wealth of useful links too.