Strengths and Limits of LLMs for Software Engineering

The key insight our industry is struggling to weaponize is that an LLM can consistently collapse 60 minutes of work into 5… for the one-off throwaway script. But that productivity boost doesn't scale to large codebases. Why?

The LLM misconception

There is a school of thought that believes all software engineering is trending toward vibe-coding. That is, we'll soon be able to automate engineering with ChatGPT. This school misses a key issue: engineering operates on two timelines.

The first timeline is focused on now. This is the code we write to unlock some new functionality. The second timeline is focused on later. This is the work we do to ensure the next 100 features can be efficiently built.

For the sake of having a label, I will refer to the first timeline as implementation and the second as design.

Design: The process of finding the most sustainable solution for a problem, given constraints (time, hardware, scaling, money, product, etc.).
Implementation: The process of going from this design to tangible software.

LLMs accelerate implementation. They (almost) do what you tell them to do, and they do it quickly. But LLMs fail at design.

Imagine an architect tasked with designing and building a bridge. For the sake of argument, he's alone in this effort. He's forced to haul steel and perform welding. It would take years to complete the bridge. What if we introduced a team of construction workers? The bridge would be complete in far less time. This is what LLMs offer: a speed up in building.

What if you send the builders off to construct the next bridge without an architect? They're likely to complete something, but few would dare to cross it. If the load distribution is miscalculated, the wrong materials are used in key tension zones, or if the soil conditions are not accounted for... a bridge will collapse under pressure.

Misapplication of LLMs can negatively impact productivity.

This frustration...

...reminds me of one of my favorite Jack Danger quotes:

You could, right now, delete all the automated tests at your company. You would get a small but real increase in shipping speed for a few minutes. And then all hell breaks loose.

That’s like taking a 12-hour loan from the mob.

The LLM unlock

Implementation is what becomes fast in an LLM world. Instead of manually keystroking every line of code, an engineer can use natural language to describe their solution. The description of that solution is shaped by design. LLMs have not yet proven their utility in design. Besides design and implementation, engineers also spend a lot of time reviewing each other's work. Verification includes QA, code reviews, and architecture reviews.

Now, we can paint a (crude) picture of software engineering:

A very simplified graphic for the design / implementation / verification cycle

"Vibe-coding" is the attempt to do both design and implementation with an LLM. Besides manually testing, verification is mostly left up to the LLM.

So, why can an LLM collapse 60 minutes of work into 5 for the one-off script or SQL query? Well, now we understand that:

The functional requirements are limited. This means we don't need to spend much time verifying (QA'ing) the results.
A script's duties tend to have a small surface area. For example, a script to generate a report or a read-only SQL query. In these cases, you may not need to verify the code as its implications are harmless. (Using our architect metaphor, it's the difference between building a cardboard dollhouse and a bridge meant to carry people.)
The need for design is low. If the solution is hard to iterate on in the future, just throw it away and try it again from scratch.

Scaling LLMs to larger codebases

The aspiration we should all have is to (to use a Civil Engineering analogy) give LLMs the work of building the odd Bakersfield Walmart parking lot so we can be focused on designing the Golden Gate Bridge.

Knowing these constraints, how do we scale LLMs to larger codebases? No one knows yet. But here is where I'd like to see investment: guidance and oversight.

Nuance: Implementation and design are not two distinct phases

Often, senior engineers will solve a problem in two passes.
On the first pass, they are just hacking things together to get it to work end to end. This process leads to a stronger clarity in the solution space, which informs design. Feeling the edges of the implementation helps us know which areas can easily support the new feature, and which areas may need realignment to best serve the needs of this feature and future features.
On the second pass, they are using the well-informed design to build the right solution.
Not every engineer uses this workflow. But you can see how in the first pass, the LLM is incredibly useful. We can go from 0 to hack faster than ever! (Hack is not pejorative in this sense, since it is also shrinking the need for design)
Some design also requires a lot of prototypes. For example, let's say we want to support WebRTC calls. In order to do so, we'd want to explore multiple approaches (STUN, TURN, vendored, one side uses VOIP and the other uses WebRTC) to weigh the pros and cons and to make sure the integration is technically feasible.