In the year of our Lord Claude Code
I think it is important for reasons of fairness that if a human won’t review their own AI-generated code, we cannot expect other humans to do so either. The time required to conduct a thorough human review is much higher than the time it takes Claude to generate tens of thousands of lines of code. This wouldn’t have been a problem if it were trivial to identify the parts of code that are good and the parts that need to be improved. If 5% of something is bad, and it’s non-trivial to identify which 5%, then the whole is unusable. Separating the wheat from the chaff is the whole point of a code-review.
My bot fights your bot
A follow-up might be that reviewers should incorporate AI to review code. This is similar to the follies of Lord Dorwin in the book Foundation.
Hardin continued: “…Lord Dorwin thought the way to be a good archaeologist was to read all the books on the subject—written by men who were dead for centuries. He thought that the way to solve archaeological puzzles was to weigh the opposing authorities…
― Isaac Asimov, Foundation
In the case of a code-review, the archaeological puzzles are the code-changes and their underlying motivation. An AI model is probabilistically operating on context that may or may not be effectively encoded in the code or the comments. Layered on top is the reviewers’ own lack of understanding of the model entrusted with the task. The lack of pre-existing context, combined with a model’s own probabilistic folly, effectively renders an AI generated code-review of AI generated code insufficient.
I don’t think that’s currently possible as AI models aren’t able to evaluate business logic, architecture, and abstractions to the same level that a human with much larger amounts of context on the same problem might. Partly as all the context isn’t, and potentially cannot be encoded in the form of types/names/functions/documentation etc. But then again, why put a human in the loop at all?
A computer can never be held accountable. Therefore a computer must never make a management decision.
– internal IBM training
Which brings me to the thing that I’ve been wondering about for the last few weeks months.
What is a code-review?
Code-review, IMO, is an instrument for growing our shared understanding of the problem-space, building trust in the code we ship, and in each other. It’s also a framework for establishing accountability. The primary person accountable is the author, but some share of accountability lies with the reviewers. Accountability is one of the cornerstones of civilization, we like to believe that the world is more or less a just place due to the fact that we have measures in place to hold each other accountable. Code-review is one such measure on a much smaller scale, i.e., a codebase. With the advent of vibe-coding and vibe-reviewing, we’ve temporarily decided to ignore the fundamental reason why code-reviews exist.
With whom does the burden of proof lie?
“What can be asserted without evidence can also be dismissed without evidence.”
– Christopher Hitchens
When a PR gets submitted without evidence backing that it improves upon the state of the codebase, it can be rejected without evidence, especially when creating imagined dragons is incredibly cheap, and disproving their existence is materially difficult.
Is the human who reviews the PR using AI responsible for approving the PR? Is now the entire team responsible for maintaining the tens of thousands of lines of code added to the repository? When the application goes down, who you gonna call, Ghostbusters? And can the losses due to downtime be billed to frontier model companies?
Closing thoughts
My opinions on this are in-flux to the point that I started writing this at the end of February when the open source world started to reckon with the onset of AI-drive-bys [1] [2]. Instead of this art being finished, it’s merely being abandoned on the 15th of May. Arguments over LOCs and granularity of code-reviews are, IMO, futile devices. Instead, we must re-examine what code-reviews stand for, whether they still serve us, and how must they change to continue to be effective.