Are agents a better fit for AI in software development?
Async less impact to developer flow than sync
In a previous post I talked about how many members of my software development team at StrongDM found little value in using Github CoPilot. Since then, I have received a lot of feedback which ranges from others finding CoPilot, and more generally GenAI code-completion products, immensely useful, to those that found them to be a waste.
In short our experience shows that Github CoPilot is marginally useful at best and even so it appears to be good at very simple software development tasks. Is it worth the incremental price of $10-$39 per developer per month? In my opinion, it is not.
First, it’s worth noting that the value or utility of AI is a question that is top of mind for many organizations. It’s evident that there are clear AI winners: Nvidia, the cloud hyperscalers, data center operators, utility companies and so forth. These are, all, the “picks and shovels” winners. The winning AI application/company - maybe outside of OpenAI and similars, is yet to emerge.
A recent report out of Goldman Sachs “Gen AI: too much spend, too little benefit?” offers a more sobering reality to the AI hype. Yes, AI is fantastic - we’ve all experienced the “magic” of ChatGPT similar GenAI models/products. But, AI is prohibitively expensive, both in terms of building large scale models and inferencing costs. Not only is AI expensive, and as mentioned earlier we are still awaiting for the emergence of winning AI applications.
The promise of generative AI technology to transform companies, industries, and societies continues to be touted, leading tech giants, other companies, and utilities to spend an estimated ~$1tn on capex in coming years, including significant investments in data centers, chips, other AI infrastructure, and the power grid. But this spending has little to show for it so far beyond reports of efficiency gains among developers. And even the stock of the company reaping the most benefits to date— Nvidia—has sharply corrected. Source: Goldman Sachs
Is that it then, is that the end of AI in software development? I think not. I have been, and still am, a long time advocate of AI+Human co-operation and still see tremendous value of using AI in software engineers, and many other disciplines. I just think we’re focused on the wrong use case.
The future isn’t code-completion, its task completion
Much of the present day focus on the value of AI in software development is on code-completion products like GitHub CoPilot. As previously mentioned, the uptake and more importantly value of these products today is questionable. Some developers swear by it and how it improves their productivity, whilst others scorn them vowing never to use them again.
I think the issue is that code-completion is a use-case that is very hard to get “right”. The biggest challenge that these code-completion products face is interrupting the developer’s flow. I previously covered this topic in an earlier post, during which I wrote the following:
One concept that I have come across recently is that of flow. Mihály Csíkszentmihályi has written extensively about the state of flow and its relationship to a happy and fulfilling life. According to Wikipedia,
“… flow, also known as the zone, is the mental state of operation in which a person performing an activity is fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity.”
I have since come to believe that in order to get the most out of our engineering team, you have to provide them an environment that fosters this sense of flow. It follows then, that what you want to measure are all the elements that stand in the way of getting to flow. One of the most critical elements that lead to flow is uninterrupted time, which in my experience is the main obstacle standing in the way of engineering productivity.
A developer in a state of flow is feeling very productive, and I posit that the very last thing they need is to be interrupted, which is exactly what these code-completion products do.
Consider this recent data out of Google, showing that acceptance rate for AI code suggestions, which is defined as follows:
Fraction of code created with AI assistance via code completion, defined as the number of accepted characters from AI-based suggestions divided by the sum of manually typed characters and accepted characters from AI-based suggestions
An acceptance rate of 50% while admirable, I argue is highly disruptive, especially for a developer in a state of flow. Every second suggestion is turned down. That’s not too promising and can be annoying.
We observe that with AI-based suggestions, the code author increasingly becomes a reviewer, and it is important to find a balance between the cost of review and added value. We typically address the tradeoff with acceptance rate targets. Source: Google
The issue with code-completion is that its in the way. It’s working in real-time and synchronously with the human. I think a better model is one in which AI works side-by-side and asynchronously with a human. There are many useful tasks that AI can help an human developer with, without being “in the way” and some of these can be immensely productive.
My favorite example, is using AI to write unit tests, or broadly speaking relying on AI to author exhaustive tests for a piece of code. That’s a task that the AI can do offline, or asynchronously, without interrupting the human. When completed, the human can review what the AI authored and submit it. Similarly, AI can help with refactoring/linting code, increasing testing code coverage, writing/accepting PRs, reviewing code and so forth. These tasks are all ones that AI can help with, without being in the way of the human-developer. There also time consuming tasks, therefore by relying on the AI to attempt to complete them we might gain productivity (and quality) gains to the human developer too.
In summary, I think the future of AI in software engineering, is more similar to what Codium is working on vs traditional code-completion products like CoPilot and others.
IMO the problem with this take is:
1. The figuring out of types, etc. is a task that the AI does well and is flow-disrupting. So you assume that the interjections will disrupt flow but you have to consider that they may preserve it.
2. The developer may not see the suggestions when they're in flow anyway, though they can indeed be distracting.
3. The AI's suggestions are still wrong enough that everything needs to be reviewed anyway. So, pair-programming it (with AI as author) is probably the fastest way to get things done.
One of the challenges with AI pair programming is that the choruses are sung so fast it's hard to prepare for the next verse. The programmer is forbidden easy tasks, which doesn't help with the most limited budget: the deep thought.
Thanks for writing! Great as always Karim.