Unlocking the Power of AI: Revolutionizing Software Development
Title based on this ChatGPT prompt: "what is a good title for a blog post on using AI for software development?"
The full-tilt adoption of LLMs like ChatGPT has, in my opinion, resulted in a “when” not “if” scenario for the usage of these tools by software development organizations. My projection is that these products will become as pervasive as other tools that are commonly used by software developers like compilers, linters, IDEs and the like. Even though the eventual widespread adoption is only a matter of time, I argue that the initial phases of adoption need to be intentional and evaluate the following questions.
The first, is to try and assess the (productivity) impact AI products like ChatGPT or CoPilot have on an engineering organization. The second, is finding use cases and tasks that AI products can apply either at the individual software developer level or across the entire organization . The third is concerned with the risk of using these products.
Measuring the impact of AI products
Empirical data suggests that AI products result in productivity gains anywhere from 30-80% for coding tasks.
“In fact, anecdotal evidence has suggested that productivity improvements of 30%-80% are not uncommon across a wide variety of fields, from game design to HR. These are not incremental gains, but rather massive effects that have the potential to transform the way we work.” Source: Secret Cyborgs: The Present Disruption in Three Papers
I argue that the devil is in the details here and that these gains are, at least for now, idiosyncratic. Therefore, the most optimum way to evaluate this impact is to introduce AI products into your development life-cycle and observe how these products can impact the way your team works.
One of the better models I have seen applied is to assume that the AI product is a junior member of your software development team. Not only are they junior, but they are prodigious: a savant of sorts. It is therefore, up to the rest of the software development team to hone and “teach” that junior savant to be productive. That requires many interactions between human software developers and the AI.
Applying this iterative interaction between AI + human is best done on a real software project. One of the pitfalls I have observed is generalizing the powers of AI just by observing how well they tackle LeetCode problems. Mastering coding challenges doesn’t necessarily make the AI, or humans, good software developers. Therefore, my preference is to observe and assess this interaction on a real software project in which the AI is, as I mentioned earlier, treated as a +1 software engineering member of the team.
The question is, what are you observing and trying to assess? I’ll be honest, I don’t have a particularly good answer, but I will share what one of my teams is attempting to do. We picked a small-ish sized project, on the order of 2-4 weeks of software development work by a team of 2 engineers. The team first completed the project without using AI. That set a baseline for their productivity (and de-risked the project, which I care about!) Next, they started from scratch, but rather than work alone they introduced AI products like ChatGPT and CoPilot. They repeated these cycles numerous times. Every iteration they would start from scratch and measured the following:
What portion of the end product (think lines of code) was directly contributed by the AI.
Time spent nurturing the AI, both in terms of reviewing code the AI wrote and prompt engineering. This is the nurturing required to develop the savant junior developer.
There’s an obvious issue with this experiment: bias. The human developers have already “solved” the problem (the baseline) and therefore are susceptible to steering the AI into the solution space at every iteration.
The other issue is one of generalizability. Adopting the AI in this particular project might have worked well, but is that enough to deduce that the AI will work across all other teams/projects? Another unknown is whether the cost of nurturing is a one thing or will it be repeated for every project? There’s no point in spending weeks, or months on every project trying to optimize prompts and the manner in which the AI is used alongside humans. That should ideally be a one-time fixed cost versus continuous on every project. Ideally, you are able to train the AI to consume Jira tickets and emit Github PRs for a human to review :)
Use cases for your AI developer
The use cases for AI range from centaur like ones, whereby the human and AI work together, to ones in which the AI works independently of the human. Presently a few Centaur-like use-cases exist. For example, GitHub’s Copilot offers an “auto completion” feature whereby the human enters a text prompt describing the code they want, and boom, the AI writes the code.
The more exiting use cases in my opinion are ones in which the AI and human work independently. Perhaps the AI can write test code for code authored by a human. Or perhaps one day the AI will be able to consume tickets in Jira and emit PRs that are then reviewed by a human (note, I have already seen prototype of this...) That latter use case is one which I suspect will become the norm in the not so distant future. It is also illustrated in the slide below, taken from Matt Welsh’s excellent ACM talk Large language models and the end of programming.
There be dragons
Note, I am not a lawyer so take the below with a big grain of salt and do consult your legal team.
The usage of AI products in the software development lifecycle introduces a new layer of risk for IP and ownership.
Consider the case of CoPilot writing a piece of code for your product. Can you file a patent based on this code? Could Microsoft (Github’s parent company), or the software developer’s whose code CoPilot was trained now make claims on your IP and product licensing rights?
Presently, AI tools have different licenses and terms that apply by use case. For example the licensing terms for everyday-use might differ than those for API and so on. The terms do matter when it comes to what the tool can then do with our input. For example, CoPilot terms state that you own the “suggestions.” However, if it gives a “suggestion” that is based off of a license that requires attribution or is a copyleft license, then the use of that code is violating the license. Additionally, computer generated code is not patentable or copyrightable, which can complicate what a company deems to be its core and therefore, patentable, IP. Even the AI agrees that there are legal risks!