Skip to main content
Matt Sears

I built shipyard so I could build lightwork

On the morning of June 4th I typed one command into a terminal. It audited my app — accessibility, SEO, privacy, security, docs, a dozen other dimensions — and filed 64 GitHub issues, each one labeled, severity-ranked, and written like a competent engineer's bug report. The skip-navigation link missing from the authenticated app shell. Tap targets two pixels under Apple's 44-point floor. A cookie-consent link that wasn't keyboard-activatable.

That evening I typed one more command and went on with my night. Between 9:13 and 10:33 PM, thirteen pull requests merged — each one written by an autonomous worker in its own git worktree, each one closing one of the morning's issues, each one merged by CI without me touching the keyboard. The skip-link fix merged 26 minutes after its PR opened.

I didn't write a line of any of them. This post is about how an ordinary Thursday got to look like that — because the machinery behind it is the most leverage I've ever gotten out of a tool I built.

Two projects, one bottleneck

For the last ten weeks I've been working on two projects at once. The first is Lightwork, a volunteer task management app — Expo (React Native) + Firebase, shipping to iOS, Android, and the web. The second is Shipyard, an autonomous engineering loop built on Claude Code: it finds work, refines it into tickets, and turns the issue backlog into merged pull requests, with the human kept at exactly the points where judgment matters.

The relationship between the two is the point of this post: I built shipyard because I was the bottleneck in building lightwork.

The app, and the four months that didn't ship

Lightwork is the product. Users join groups (a church, a neighborhood association, a mutual-aid crew), post tasks that need volunteers — with locations, schedules, and recurring series — and sign up to help. Real-time data on Firestore, group search via Algolia, maps, per-task messaging, localization. One codebase for iOS, Android, and a PWA.

Lightwork home screen showing a feed of volunteer tasksLightwork map view with task locations pinnedLightwork task detail screen with volunteer signup

Lightwork didn't start in React Native. The first version was about four months of work in FlutterFlow — a low-code platform where you build cross-platform apps visually instead of writing them by hand. I picked it because low-code was supposed to be the fast way to build an app. I built a lot of real features there, but the developer experience was atrocious: I had no way to write any meaningful tests, and too much of what FlutterFlow did was a black box — I didn't fully understand how it worked, and I didn't have full control over what it was doing. The prototype never quite reached the point where I could ship it. In April 2026 I started over in React Native, and the first task I completed was porting that FlutterFlow prototype into the new codebase.

Ten weeks later, the repo is at 1,671 commits, version 2.51.0, and roughly 153,000 lines of TypeScript — about 80k of app code and 73k of tests — working through the App Store and Play Store launch checklists. Four months of low-code got me a prototype; ten weeks of this got me to launch prep. One person.

The difference was Claude Code — and, very quickly, what I learned about where its real constraint is. An agent session is brilliant at writing code and helpless at logistics. Drive one by hand and you become the conveyor belt: write the prompt, watch the checks, rebase when something else lands first, click merge, start the next one. The agent implements a well-specified issue faster than you can write the spec, and much faster than you can shepherd the result home. Multiply that across a backlog of hundreds of issues and the math is brutal: the human in the middle is the slowest part of every cycle.

The constraint wasn't the model. It was me.

So I built the machine that does my old job

Shipyard is an engineering loop — find work → refine it → do it → merge it → repeat — packaged as a Claude Code plugin so the loop runs without a human driving each step. It does three things:

Shipyard: audits, third-party services, user feedback, and manual entry feed an issue backlog; an orchestrator dispatches issue workers that ship the work

  1. It finds work. /audit runs specialized agents across seventeen dimensions — performance (Lighthouse), security, accessibility, SEO, privacy/GDPR, PWA readiness, release readiness, tech debt, testing gaps, docs rot, observability, API surface health, and more — and files a labeled, severity-ranked GitHub issue for every finding. That's where June 4th's 64 issues came from.
  2. It refines work. /refine-issues takes raw input that isn't ready to be worked — a user-feedback form submission, a feature request with open questions — and classifies and rewrites it into an implementation-ready ticket. Anything touching real user feedback is gated behind a needs-human-review label: no code-modifying agent runs until a human signs off.
  3. It does work. /do-work is the orchestrator. It ranks the eligible backlog and keeps N parallel workers in flight, each in its own isolated git worktree on its own branch. Each worker implements the smallest change that satisfies the issue, opens a PR that closes it, and arms auto-merge. Green CI means it merges itself and the next worker dispatches. When CI breaks — on a PR, or on main itself — the orchestrator diverts a worker to fix that first.

Shipyard at a glance: five stages from issue sources through refinement and human review, into the orchestrator, out to parallel workers in isolated worktrees, and into the PR pipeline with auto-merge

The design bet that makes it compose: everything is a GitHub issue. Shipyard doesn't care where an issue came from — an audit agent, a Sentry integration, Dependabot, a support tool, or a human. If a service can file a GitHub issue, shipyard can work it. Production error → issue → reproduced → fixed → PR → merged, with zero human steps if you let it.

And to be honest about June 4th: fifteen PRs were dispatched that evening and thirteen merged on the spot. The other two needed more time — one merged two days later, the other three. The loop doesn't bat 1.000; it bats well enough that the stragglers are the part I notice.

Self-healing software

That design bet pays off hardest with the inputs that aren't me. Lightwork's production errors now file their own bug reports: Sentry is integrated with the GitHub repo, and Crashlytics watches the mobile apps — when either one catches an error or a crash, it automatically opens an issue, stack trace and all. From there the issue rides the same loop as everything else: refined, investigated, dispatched, fixed, merged. An error that surfaces in production can be patched within minutes of the first time it fires, without anyone triaging a dashboard first.

Users feed the loop too. The app has a feedback form, and submissions flow straight into the issue backlog — feature requests, bug reports, all of it. But this lane has guardrails, because it's the one lane the public can write to. Shipyard only auto-works issues from trusted authors; anything from an untrusted source — the feedback form included — is held behind a human-review gate until I explicitly approve it. The security model in one sentence: anyone can ask, but only I decide what the machine builds. Once I've signed off, though, a user-reported bug or user-requested feature flows through exactly the same machinery as everything else — which means users of a one-person app get their bugs fixed and their feature requests shipped.

The next step is closing that loop outward. The feedback form already captures the reporter's email address, so notifying someone when their report ships — "the bug you filed on Tuesday was fixed on Wednesday" — is one small automation away. A feedback form that answers you back.

Where the human fits in the loop

It's an autonomous loop, not an unsupervised one — the division of labor is explicit, and it cuts along what each side is actually good at.

What shipyard does that I can't: be in eight places at once, and never get bored. It keeps parallel workers running around the clock, watches every CI run to completion, rebases stale PRs, re-fixes failing checks, applies every label and changelog convention without ever forgetting one, and grinds through the long tail of small, well-specified fixes that would never justify an hour of a human's day. None of that work is hard — there's just far more of it than one person can physically do.

What I do that shipyard can't: decide what's worth building, and why. I write the issues (or approve the ones it drafts), make the product and design calls, sign off on anything derived from real user feedback before any code-modifying agent is allowed near it, review the PRs where correctness has real stakes, and unstick the work it hands back as blocked.

And the loop closes in both directions: /my-turn surveys the open PRs, the backlog, and recent comments, and produces a prioritized list of exactly the items that are blocked on me rather than on the machine — the read-only, human-facing counterpart to /do-work. I gate what it builds; it tells me where I'm the bottleneck.

It built the platform, not just the patches

It would be easy to read all of this as "an agent that does small fixes." The part that changed my mind about what these loops are for is that shipyard built lightwork's infrastructure — the unglamorous platform work that usually eats weeks.

It built the entire CI/CD pipeline: lint, typecheck, and the unit suite on every PR; end-to-end shards against Firebase emulators; Lighthouse and bundle-size budgets; store-credential validation; and the deploy side — web deploys, EAS builds for iOS and Android, and fully automated submission to the App Store and Google Play, store metadata included. I iterated on that pipeline the same way everything else here works: file an issue, let a worker ship the change. At this point it's genuinely polished — release-please cuts the versions, and the release notes are drafted by Claude without me writing a word of them.

It also wrote the safety net that makes the speed survivable: over 3,000 unit tests, over 300 end-to-end tests, and about 50 smoke checks — a post-release suite that runs against production plus Maestro flows on real mobile builds. That's the part people miss about merging on green CI: an autonomous loop is only as trustworthy as what "green" means. Shipyard raised the bar it has to clear, then kept clearing it — which is how the repo absorbs hundreds of merges a month without breaking underneath me.

Two more pieces of platform work I'd have dreaded doing alone. First, fully isolated test and production environments — separate Firebase projects, separate bundle IDs, separate OAuth apps — enforced by an invariant table that's checked at build time and at runtime, so a half-configured binary refuses to boot instead of quietly talking to the wrong database. Second, authentication: email plus Google, Apple, and Facebook social logins, working across iOS, Android, and the web. Anyone who has wired OAuth redirects, URL schemes, and per-platform app registrations across three platforms and two environments knows exactly how much tedium is buried in that sentence.

The numbers

Shipyard's first commit was May 16, 2026 — twenty-six days ago. Here's the lightwork repo's merge cadence over the past six weeks:

Bar chart of merged PRs per week in the lightwork repo, May 4 through June 11. Weekly totals: 17, 263, 354, 161, 175, and 30 in the partial final week

One honest note on attribution: the shipyard label that stamps every orchestrator-opened PR didn't exist until May 18 — shipyard was already doing work before its label was — so I can't cleanly split that chart into "the machine's PRs" and "mine," and I won't pretend to. What the label does establish: since May 18, 524 merged PRs were opened end-to-end by shipyard workers. Treat that number as a floor, not a census.

The totals so far:

  • Lightwork: 1,016 merged PRs, 839 issues closed, and over 300 of the backlog issues were filed by shipyard's own audit agents.
  • Shipyard: 282 merged PRs, 250 issues closed — and 252 of those PRs (89%) were opened by shipyard working on itself. The tool is its own primary user, which is exactly the feedback loop you want when shaking out an orchestrator.
  • The codebase split is telling: shipyard is ~48,000 lines, of which ~15,000 are markdown and ~26,000 are bash. The "source code" of an agent system is mostly carefully-written specifications — the prompts are the program.

What did it cost? Shipyard keeps a local per-session token ledger, and it reports $357 estimated across 97 orchestrator sessions (~71M tokens, priced at API list rates — about $213 of that on lightwork). Two caveats in both directions: early sessions under-recorded their token counts, so that's an undercount; and it excludes the interactive Claude Code sessions I drove myself. Even doubled, it's a striking number against 500+ merged PRs.

The honest caveats

If the June 4th story has you tempted, you should know where it hurts first:

  • It's experimental. The README opens with a warning box, and it's earned. Safety comes from prompt discipline and layered guardrails — issue labels, worktree isolation, a no-hook-bypass rule, human-review gates — not from architectural sandboxing. You give it broad permissions by design.
  • CI minutes are real money. Parallel workers that retrigger full CI suites can burn through Actions minutes; shipyard now has config knobs specifically to cap that, because I learned the hard way.
  • Issue quality in, PR quality out. A clean stack trace or a well-scoped ticket gets a good fix. A vague one-liner gets a vague PR or a blocked bail. The leverage shifts your effort toward writing better issues — which, honestly, is effort well spent anyway.
  • Not everything is auto-fixable. Shipyard returns blocked on work it can't reproduce or safely scope, and parks it for a human. The win is the long tail of well-specified small-to-medium changes that used to sit in the backlog because no human's time ever justified them.
  • Review is still the gate you make it. Auto-merge respects branch protection. On lightwork I run with required CI checks; on a team repo you'd keep required reviewers, and shipyard's PRs queue up for humans like anyone else's.

The long tail is the point

Every codebase has a backlog like my June 4th list. The missing skip link. The tap target two pixels short. The doc that drifted from the code months ago. None of it is hard; none of it is urgent; none of it will ever beat a feature for a human's afternoon. So it sits — not because anyone decided it should, but because attention is the scarcest resource in the building, and that work never wins the auction.

That's the work an engineering loop is for. The unit of coordination is just a GitHub issue, which means the intake side (Sentry, Dependabot, support tooling, audits, humans) and the review side (branch protection, CODEOWNERS, required reviews) are things every team already has. Shipyard slots into the middle and burns down the well-specified work, while the humans keep every decision that actually needs one.

The shipyard repo is public at mattsears18/shipyard — the closed PRs labeled shipyard are the living demo — and the app it helped build lives at getlightwork.com. Installing the plugin is two commands:

claude plugin marketplace add mattsears18/shipyard
claude plugin install shipyard@shipyard

Start on a throwaway repo with --concurrency 1 and a billing alert, per the README. Then write one good issue, run /do-work, and watch a PR show up that you didn't have to shepherd home. That feeling is the whole pitch.

I built shipyard so I could build lightwork — Matt Sears