The Maintainer's Dilemma

Wed, 20 May 2026 10:00:00 -0400

A protected branch requires a second person to review a change before code ships. The rule exists because humans make mistakes, and a second pair of eyes catches what the first one missed. But what happens when one of those reviewers is a robot? What if both are?

Currently I can ask an AI to open a pull request on my repository and then merge it myself. Or I can write the code and ask an AI to review it. In both cases, the branch is technically “protected.” But what does that even mean now?

Those are questions worth asking. But they tend to get all the attention while a more immediate problem goes unaddressed. Right now, across my repositories, there are open pull requests from people who took time to understand the codebase, write tests, and submit clean patches. I haven’t reviewed them. Sadly, the ironic part is I haven’t reviewed them because I deeply care. I know how big of a deal it is to take the time to open a pull request on a project, and even more so on a popular one. I know that each one deserves an honest review and that that takes time I don’t have for projects I maintain as a volunteer. For every open source project with a funded maintainer, there are millions with an unpaid human staring at a growing backlog, wondering whether the right thing to do is spend their weekend triaging issues or just close the laptop and go outside.

AI tools can now do credible code reviews, write patches, and triage issues. The question is no longer whether they’re good enough to be useful. The real questions are where “useful” becomes “liability,” and whether leaning on them breaks something we can’t easily rebuild — the trust between maintainers and contributors, the judgment that comes from experience, and the community that forms around that exchange.

118 Open Pull Requests

I checked my GitHub notifications last week and closed the tab.Cobra has 243 open issues and 118 open pull requests.Afero has 114 issues and 55 PRs. I created both projects. I haven’t meaningfully reviewed a PR on either in months.

In spite of my inaction as a reviewer, these are actively maintained projects. Cobra powers kubectl, GitHub CLI,Hugo, and hundreds of thousands of other tools. When you typekubectl get pods orgh pr list, Cobra parses your command. Afero sits inside Hugo, inside Cobra itself, inside hundreds of thousands of other projects. A careless merge on Cobra could break Kubernetes tooling. A bad review on Afero could open a filesystem vulnerability that quietly propagates through everything downstream.

I created Cobra because I needed a specific CLI UX for Hugo and no existing library could support it. I split it out as its own project, thinking others might find it useful. I never imagined I’d still be maintaining it a decade later, or that both projects would become critical infrastructure for so many others. I just wanted to build something useful for me and maybe a few friends. But does open sourcing code mean I’m obligated to maintain it indefinitely? With each new project I release, there’s less time for the existing ones. Some of those PRs have been waiting for years. There’s a reported security vulnerability in Afero’sBasePathFs that’s been sitting open since June 2025 — which, until I wrote this post, I didn’t realize was there, because the backlog is that overwhelming.

The math of maintenance doesn’t work. It’s a well-known problem across open source (relevant XKCD). The number of contributions grows faster than the number of maintainers, and the time required to review each one grows with a project’s complexity and impact. Some projects attract volunteer co-maintainers, but that brings its own problem: no one is clearly responsible, so everyone picks what matters to them and the rest just sits. Cobra is intentionally slow to change — too many projects depend on it to casually merge anything — so each change requires more thorough review, not less. Many of my other projects fall into the gray area between maintained and abandoned. I’d describe it as optimized maintenance around the most critical paths, but that distinction matters a lot more to me than it does to the person who submitted a fix eight months ago and never heard back.

This isn’t just my problem. GitHub hosts over 420 million repositories. I was very fortunate to be a part of the inaugural cohort of theSecure Open Source Fund — a real investment that made a real difference. But even after expanding to several cohorts, it covers about 200 projects.OpenSSF scans a million critical projects weekly.Tidelift pays maintainers. Add it all up and you’re covering thousands of projects. It’s meaningful work. It’s also a rounding error against the actual surface area.

Ninety-six percent of codebases contain open source components, and the foundations they build on are maintained by people staring at a backlog they’ll never clear, wondering if this is the weekend they finally burn out or just stop checking. And that’s before you get to maintainer guilt — the knowledge that people are counting on your work, that you have the capacity to help, but that you can’t keep up.

Enter the Robots

I’ve been experimenting with AI tools across a couple of repositories —Jules onfileflow andpathologize, which have fewer dependencies and more room to try things. I’ve also been runningGitHub Copilot on Afero, which has more dependencies, but its modular architecture lets me extend new backends without touching the critical paths other projects depend on.

I turned Jules loose and watched email after email arrive with new PRs. Looked promising. Then I went on a cruise. While I was at sea, Jules kept going, opening more pull requests each day because I hadn’t merged the first ones yet. By the time I got home, I had over 120 PRs across the two projects. I set aside a morning to review them all, only to find they represented roughly five distinct change sets, each submitted daily over several weeks. The PRs themselves weren’t wrong, Jules had found real issues. But none were quite right either; each needed course correction before I could merge. With guidance Jules made the adjustments, and the overall direction showed promise. But the experiment so far has created more maintenance work, not less: I had to verify each of those 120 PRs was actually a duplicate before closing it. The tool meant to reduce my backlog had added to it.

Jules also created these PRs as me, not as Jules — which raises its own questions about attribution and accountability. From the repository’s perspective, I authored those changes. But I didn’t write a line of them. If one of those patches introduced a bug or a vulnerability, the commit history points to me. Most contributor policies weren’t written with this scenario in mind, and the standard CLA doesn’t distinguish between code a human wrote and code a human directed an AI to write on their behalf.

Currently, it appears that Jules has no memory of its own previous work and doesn’t have the ability to check open PRs. It scans the repository, finds an issue, opens a PR, and stops. If you don’t merge it, Jules finds the same issue next time and opens another PR. It has no way to know you’re aware of the problem and haven’t merged it for your own reasons: maybe you disagree with the fix, maybe it’s lower priority, maybe you’re on a boat and won’t get to it for a couple weeks. That context is invisible to the tool. Jules found a real vulnerability — TOCTOU bugs in file operations are a genuine class of security problem — and it was right to flag it… once.

For the mechanical work — flagging issues, updating dependencies, drafting boilerplate responses — the tools are genuinely useful. But Jules and Copilot couldn’t tell me whether one of those 55 Afero PRs belongs in the project at all. That judgment requires knowing the codebase’s past and future, not just its present state.

These tools work only from what’s visible: the code, the open issues, the PR history. Maintainers work from the visible and the invisible: the context that never made it into comments, the constraints nobody wrote down, the internal debates that shaped the API. The gap between those two things is where human judgment is most irreplaceable, and where AI is most blind.

Russ Cox, who I worked with on the Go team, put this well in arecent discussion about AI contributions: “People brag about codebases of hundreds of thousands of lines that have never been viewed by people, churned out in record time. On closer inspection, these codebases invariably turn out to be more like dancing elephants than useful engineering artifacts.”

He’s right about novel code. But I keep coming back to the distinction between writing new software and maintaining existing software. Dependency updates aren’t dancing elephants. Triaging a stale issue isn’t a creative act. Telling a contributor “thanks, we’re not accepting changes to this API” is just keeping the lights on. And right now, for millions of projects, the lights are off.

That isn’t the biggest challenge, though. What most people don’t realize is that evaluating and merging changes is far harder than writing new code. Understanding how a change fits into the existing codebase, its history, and its plans requires knowledge that’s partly invisible — not in the code, not in the comments, not in the issues. It’s in the maintainer’s head. And some of it is deeply creative work, requiring the kind of judgment no model can replicate.

What “Protected” Actually Protects

The Go project is genuinely beautiful to me — meticulous reviews, design discussions that ran for months before anything shipped, a review culture refined over 15 years. That’s the ideal. But Go is exceptional in ways most projects can’t replicate: full-time contributors funded by Google, a mandate to build a language meant to last 50 years, rare external deadline pressure.

The Go team had along thread recently about whether to accept AI-authored contributions — the same discussion where Russ Cox’s quote above originated. These are people I worked with for years — same reviews, same proposals, same arguments at the same whiteboards. Reading the thread, I could hear their voices. And I could see how every one of them was right, and why that was the problem.

Rob Pike went first and didn’t equivocate: “This is a very slippery slope. Be careful on your first step. I recommend simply saying, no.” That’s Rob. Direct, principled, and usually right. Then Alan Donovan pointed out the uncomfortable reality: “I suspect a significant fraction of CLs we receive today already include fragments of LLM-generated code, whether the authors admit it or not.” The horse, in other words, has already left the barn.

Russ Cox wrote the most thoughtful response I’ve seen anyone write about this. His core point: “The most important thing we can do is to maintain our usual processes around code review and code quality… That same bar must still apply when some or all of the code was written with the help of AI-based tools.” And: “Your responsibilities are not lessened when you use AI tools for your work.”

Every one of these positions is reasonable. And every one of them shares an assumption that exposes the core of the dilemma: they assume there are humans available to review.

Go can afford to say “maintain the same bar” because it has full-time contributors funded by Google. It has reviewers. It has a review culture that’s been refined for over a decade. Rob can say “just say no” because Go has enough people to say “yes” to the important things.

Afero doesn’t. Most open source projects don’t. When Rob Pike says no, the Go project keeps functioning. When I say no, the PR just sits there. Those are different kinds of “no.”

There’s a spectrum, and where you land on it depends on what you’re actually choosing between. In practice, maintainers face five options:

Human writes, human reviews.
AI writes, human reviews.
Human writes, AI reviews.
AI writes, AI reviews, human clicks merge.
AI writes, AI reviews, AI clicks merge.

Each step down that list typically trades rigor for velocity, and trust for throughput. But for most projects — certainly for many of mine — there’s a sixth option that doesn’t appear on that list: human or AI writes, nobody reviews.

What We’re Really Protecting

When a maintainer reviews a contributor’s PR, there’s an unspoken contract. The contributor invested hours understanding the codebase, writing tests, and submitting something clean. The reviewer invests hours evaluating it, pushing back, and suggesting improvements. Both people learn. The reviewer understands a new corner of the project. The contributor gets better at the codebase’s idioms. A relationship forms. That exchange is a big part of what makes open source a community instead of just a supply chain.

Bryan Cantrill at Oxide described this contract precisely in theirinternal policy on LLM use: ordinarily, “it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion.” When the writing is AI-generated, “this social contract becomes ripped up.” The same holds for code review — we expect the reviewer to invest effort because the author invested effort. If neither did, what’s the review even for? Oxide’s answer is that the human stays accountable regardless; the tool doesn’t absorb the responsibility. That’s the right instinct. But it assumes someone is actually there to take responsibility.

For most projects, nobody is. The social contract isn’t being broken by AI — it’s already being broken by silence. The contributor who submitted a clean, well-tested patch six months ago and never heard back isn’t experiencing a degraded version of the ideal. They’re experiencing nothing.

Is a perfect social contract that never happens better than an imperfect one that does?

A response from an AI in a day might be more respectful than silence from a human forever.

The Experiment Ahead

I’ve decided to find out. I’ve been staring at 55 open PRs on Afero long enough to know that deliberation has become its own form of neglect.

Will AI tools make me more engaged, freeing me to focus on decisions that actually need me? Or will it feel less connected — the human element reduced by one more degree? I don’t know how it feels to have an AI review a PR instead of a human, or whether accountability holds when the effort on both sides is diminished. That’s the experiment.

Russ said something else in that thread I keep returning to: “The most important thing is to keep thinking. The tools make it very easy to turn off your brain, but if you are careful to avoid that trap, you can produce good work.” That’s the line I’m trying to walk. Let AI handle the volume. Stay responsible for the judgment.

There’s no universal policy that works for both Go and Afero. There shouldn’t be.

The protected branch is still protected. I’m just not sure what that even means anymore.

Have you run into this? I’d especially like to hear from maintainers who’ve tried AI review — what held, and what broke.