I built shipyard so I could build Lightwork

June 11, 2026

On a Tuesday in May, I got up early and typed two commands into a terminal before leaving for work.

The first — /audit — inspected Lightwork, my side project, from both ends: browser-level passes over the live, deployed app (accessibility, SEO, privacy, performance) and code-level passes over the repository behind it (testing gaps, tech debt, observability, API surface, docs drift). Every finding became a labeled, severity-ranked GitHub issue, written like a competent engineer's bug report. The second command — /do-work — pointed a pool of autonomous workers at that backlog. Then I left the laptop running on my desk and headed to my actual job.

Shipyard spent the day working without me. By the time I checked back that evening, the repo had merged 116 pull requests and closed 121 issues — 110 of those PRs opened end-to-end by autonomous workers, each in its own git worktree, each merged by green CI without a human touching the keyboard. The median PR merged nine minutes after it opened — CI is path-filtered, so each change only runs the suites its files can actually affect. And it wasn't toy work: it hardened security on abuse-prone endpoints, fixed performance and accessibility violations, shipped offline support, and wrote test coverage over the social sign-in flows — the kind of changes that would each quietly eat an evening on their own.

After dinner I ran one more command — /my-turn — and got back a short, prioritized list of the things that actually needed a human: judgment calls, sign-offs, the handful of stuck items. Everything else had already shipped.

I didn't write a line of any of it. Most of the backlog wasn't written by me either — 93 of the 121 closed issues had been filed by the morning's audit agents. This post is about how an ordinary Tuesday got to look like that — because the machinery behind it is the most leverage I've ever gotten out of a tool I built.

Two projects, one bottleneck

For the last ten weeks I've been working on two projects at once — both passion projects, built in the hours around everything else. None of this is the output of full-time work; it's what nights and weekends produced. The first is Lightwork — Expo (React Native) + Firebase, shipping to iOS, Android, and the web. The second is Shipyard, an autonomous engineering loop built on Claude Code: it finds work, refines it into tickets, and turns the issue backlog into merged pull requests, with the human kept at exactly the points where judgment matters.

The relationship between the two is the point of this post: I built shipyard because I was the bottleneck in building Lightwork.

The app, and the four months that didn't ship

Lightwork is the product: an app for asking for help, and for showing up to give it. You join the groups already in your life — a church, a neighborhood association, a school parent group — or not, and when you need a hand, you ask: a ride to the airport, cookies for the hospital staff, gloves and a water bottle at the park on Saturday. Asks can be one-time or recurring, public or group-only, and they land on a map, so neighbors can see who needs a hand nearby, volunteer with a tap, and coordinate in a per-request chat. Real-time data on Firestore, offline support, push notifications, search via Algolia, social OAuth, localization, one codebase for iOS, Android, and a PWA.

Lightwork home feed of asks for help, captioned: Help asked plainly. Help given freely.

Lightwork map of nearby asks for help, captioned: See who needs a hand near you.

Lightwork groups list — an HOA, a church, a school, a volunteer crew — captioned: Built for the groups already in your life.

Lightwork group member roster, captioned: Everyone is there. Everyone can pitch in.

Lightwork ask-for-help form, captioned: Ask your community for help in 30 seconds.

Lightwork per-request message thread, captioned: Chat in real time on every request.

The screenshots above are shipyard's work too: it built the pipeline that seeds demo data, drives the app screen by screen, and composes the captioned, store-ready images — the same set submitted to the App Store and Google Play. Somewhere in the backlog is the next step: a demo video of the app, generated the same way, for the store listings.

Lightwork didn't start in React Native. The first version was about four months of work in FlutterFlow — a low-code platform where you build cross-platform apps visually instead of writing them by hand. I picked it because low-code was supposed to be the fast way to build an app. I built a lot of real features there, but the developer experience was atrocious: I had no way to write any meaningful tests, and too much of what FlutterFlow did was a black box — I didn't fully understand how it worked, and I didn't have full control over what it was doing. The prototype never quite reached the point where I could ship it. In April 2026 I started over in React Native, and the first task I completed was porting that FlutterFlow prototype into the new codebase.

Ten weeks of nights and weekends later, the repo has 1,016 merged pull requests, sits at version 2.51.0, and holds roughly 153,000 lines of TypeScript — about 80k of app code and 73k of tests — working through the App Store and Play Store launch checklists. Four months of low-code got me a prototype; ten weeks of spare hours with this setup got me to launch prep. One person, after hours.

The difference was Claude Code — and a lesson it taught me almost immediately: an agent session is brilliant at writing code and helpless at logistics. Drive one by hand and you become the conveyor belt: write the prompt, watch the checks, rebase when something else lands first, click merge, start the next one. The agent implements a well-specified issue faster than you can write the spec, and much faster than you can shepherd the result home. Multiply that across a backlog of hundreds of issues and the math is brutal: the human in the middle is the slowest part of every cycle.

The constraint wasn't the model. It was me.

So I built the machine that does my old job

Shipyard is an engineering loop — find work → refine it → do it → merge it → repeat — packaged as a Claude Code plugin so the loop runs without a human driving each step. It does four things:

Shipyard: audits, third-party services, user feedback, and manual entry feed an issue backlog; an orchestrator dispatches issue workers that ship the work

It finds work. /audit runs specialized agents across seventeen dimensions — performance (Lighthouse), security, accessibility, SEO, privacy/GDPR, PWA readiness, release readiness, tech debt, testing gaps, docs rot, observability, API surface health, and more — and files a labeled, severity-ranked GitHub issue for every finding. That's where most of the backlog that Tuesday burned down came from.
It refines work. /refine-issues takes raw input that isn't ready to be worked — a user-feedback form submission, a feature request with open questions — and classifies and rewrites it into an implementation-ready ticket. Anything touching real user feedback is gated behind a needs-human-review label.
It does work. /do-work is the orchestrator, and it plans before it builds. It starts by reading the entire open backlog and working out the order of attack — ranking issues by priority, and, crucially, spotting the ones that will touch the same files so it can sequence those one after another instead of letting parallel workers collide into merge conflicts. Then it keeps N workers in flight, each in its own isolated git worktree on its own branch, each implementing the smallest change that satisfies its issue, opening a PR, and arming auto-merge. While the workers build, the orchestrator watches the merge train: it monitors every open PR continuously, and the moment one develops a failing test, a conflict with freshly-landed work, or any other red check, it diverts a worker to fix that PR immediately — so the train keeps moving instead of backing up behind a stalled car.
It tells me when it's my turn. /my-turn is the human-facing counterpart to /do-work: it surveys the open PRs, the backlog, and recent comments to find the work that's blocked on me rather than on the machine — sign-offs, judgment calls, stuck work. And it doesn't just hand back a to-do list. It identifies the most important piece of open work, tells me the exact action to take to start on it, then walks me through the rest step by step until it's done. The other three commands take work off my plate; this one puts the right thing on it and shows me how to clear it.

Shipyard at a glance: five stages from issue sources through refinement and human review, into the orchestrator, out to parallel workers in isolated worktrees, and into the PR pipeline with auto-merge

The design bet that makes it compose: everything is a GitHub issue. Shipyard doesn't care where an issue came from — an audit agent, a Sentry integration, Dependabot, a support tool, or a human. If a service can file a GitHub issue, shipyard can work it. Production error → issue → reproduced → fixed → PR → merged, with zero human steps if you let it.

And to be honest about that Tuesday: workers opened 123 PRs that day. 113 merged the same day, nine took days longer, and one never made it at all. The loop doesn't bat 1.000; it bats well enough that the stragglers are the part I notice.

Self-healing software

That design bet pays off hardest with the inputs that aren't me. Lightwork's production errors now file their own bug reports: Sentry is integrated with the GitHub repo, and Crashlytics watches the mobile apps — when either one catches an error or a crash, it automatically opens an issue, stack trace and all. From there the issue rides the same loop as everything else: refined, investigated, dispatched, fixed, merged. An error that surfaces in production can be patched within minutes of the first time it fires, without anyone triaging a dashboard first.

Users feed the loop too. Lightwork has a feedback form, and submissions flow straight into the issue backlog — feature requests, bug reports, all of it. In effect, the public can file issues against the repo: anyone using the app can put a request in front of shipyard. But that lane has guardrails. Shipyard only auto-works issues from trusted authors; anything from an untrusted source — the feedback form included — is held behind a human-review gate until I explicitly approve it. The security model in one sentence: anyone can ask, but only I decide what the machine builds. Once I've signed off, though, a user-reported bug or user-requested feature flows through exactly the same machinery as everything else — which means users of a one-person app get their bugs fixed and their feature requests shipped.

The next step is closing that loop outward. The feedback form already captures the reporter's email address, so letting them know when their fix or feature lands — "the bug you filed on Tuesday was fixed on Wednesday" — is one small automation away. A feedback form that answers you back.

Shipyard builds shipyard

The strangest loop points back at itself: shipyard is built with shipyard. The vast majority of its merged PRs were opened by its own workers (the numbers are below). But the flywheel is the part I didn't plan: when a /do-work run on Lightwork hits friction — a worker misreads an ambiguous contract, the orchestrator mishandles an edge case, a helper script returns the wrong thing — that friction is automatically filed as an issue against the shipyard repo. The next /do-work run on shipyard burns those issues down. Friction encountered building my app becomes the spec for improving the tool, and the improved tool builds the app better. Dogfooding is eating what you cook; this is closer to a self-sharpening tool — every rough edge it hits while building Lightwork makes it a little better at building everything else.

Where the human fits in the loop

It's an autonomous loop, not an unsupervised one — the division of labor is explicit, and it cuts along what each side is actually good at.

What shipyard does that I can't: be in eight places at once, and never get bored. It keeps parallel workers running around the clock, watches every CI run to completion, rebases stale PRs, re-fixes failing checks, applies every label and changelog convention without ever forgetting one, and grinds through the long tail of small, well-specified fixes that would never justify an hour of a human's day. None of that work is hard — there's just far more of it than one person can physically do.

What I do that shipyard can't: decide what's worth building, and why. I write the issues (or approve the ones it drafts), make the product and design calls, sign off on anything derived from real user feedback, review the PRs where correctness has real stakes, and unstick the work it returns as blocked.

And that hand-off has its own command — /my-turn, step 4 above. After a day of the machine working, it's how I find out what the loop actually needs from a human, in priority order, instead of trawling GitHub trying to reconstruct it myself. It doesn't just hand me a list, either — it walks me through each item, step by step, holding my hand the whole way. When Lightwork needed Facebook login configured, it didn't say "set up OAuth" and wish me luck; it sat me down: open this Meta developer console page, create this type of app, paste exactly this redirect URI, drop these two values into these env vars, now run this command to confirm the wiring works. The machine handles everything that doesn't require me; for the few things that genuinely do — admin consoles, account ownership, judgment — it shrinks my job to following directions. The division of labor in one line: I gate what it builds; it tells me where I'm the bottleneck, then walks me through getting out of the way.

It built the platform, not just the patches

It would be easy to read all of this as "an agent that does small fixes." The part that changed my mind about what these loops are for is that shipyard built Lightwork's infrastructure — the unglamorous platform work that usually eats weeks.

It built the entire CI/CD pipeline: lint, typecheck, and the unit suite on every PR; end-to-end tests against Firebase emulators; Lighthouse and bundle-size budgets; store-credential validation. The E2E suite is automatically split into parallel shards to keep wall-clock time down — and when the shards' run times drift out of balance, the pipeline notices and rebalances them itself.

Lightwork's CI pipeline graph: changed-path and deploy-channel detection, lint/typecheck and unit tests, parallel web E2E shards with a shard-drift check, Firestore rules and Cloud Functions deploys, then OTA update or store release plus web deploys to Vercel

The deploy side is just as hands-off. Every commit to main deploys automatically to the isolated test environment. When release-please cuts a version — complete with a detailed changelog, from which user-friendly release notes are drafted automatically — that release triggers the mobile deployments and a production web deploy together. And for mobile, the pipeline decides for itself what kind of release a change actually requires: if the native runtime changed, it builds binaries and pushes a store release through the App Store and Google Play; if it didn't, it ships the same code instantly as an over-the-air update. I never make that call — it just does the correct thing. I iterated on the whole pipeline the same way everything else here works: file an issue, let a worker ship the change.

It also wrote the safety net that makes the speed survivable: over 3,000 unit tests, over 300 end-to-end tests, and about 50 smoke checks — a post-release suite that runs against production plus Maestro flows on real mobile builds. That's the part people miss about merging on green CI: an autonomous loop is only as trustworthy as what "green" means. And yes, an agent writing tests for its own code is the obvious worry — which is why one of the audit dimensions exists specifically to hunt tests that lie: empty, tautological, mock-only, assertion-free. The suite that gates the merges gets audited by a different set of eyes than the ones that wrote it. Shipyard raised the bar it has to clear, then kept clearing it — which is how the repo absorbs hundreds of merges a month without breaking underneath me.

Two more pieces of platform work I'd have dreaded doing alone. First, fully isolated test and production environments — separate Firebase projects, separate bundle IDs, separate OAuth apps — enforced by an invariant table that's checked at build time and at runtime, so a half-configured binary refuses to boot instead of quietly talking to the wrong database. Second, authentication: email plus Google, Apple, and Facebook social logins, working across iOS, Android, and the web. Anyone who has wired OAuth redirects, URL schemes, and per-platform app registrations across three platforms and two environments knows exactly how much tedium is buried in that sentence — the code was shipyard's; the console clicks were that /my-turn walkthrough from earlier.

And one more, since you're looking at it: shipyard built this site. When I wanted somewhere to publish this post, shipyard rebuilt my personal site from the ground up — Next.js, the design system, RSS, the SEO and accessibility plumbing — and had it live in a day. Every pull request ever merged into the site's repo was opened by shipyard.

The numbers

Shipyard's first commit was May 16, 2026 — twenty-six days ago. Here's the Lightwork repo's merge cadence over the past six weeks:

Bar chart of merged PRs per week in the Lightwork repo, May 4 through June 11. Weekly totals: 17, 263, 354, 161, 175, and 30 in the partial final week

The totals so far:

Lightwork: 1,016 merged PRs, 839 issues closed, and over 300 of the backlog issues were filed by shipyard's own audit agents.
Shipyard: 282 merged PRs, 250 issues closed — and 252 of those PRs (89%) were opened by shipyard working on itself. The tool is its own primary user, which is exactly the feedback loop you want when shaking out an orchestrator.
The codebase split is telling: shipyard is ~48,000 lines, of which ~15,000 are markdown and ~26,000 are bash. The "source code" of an agent system is mostly carefully-written specifications — the prompts are the program.

What did it cost? Shipyard keeps a local per-session token ledger, and it reports an estimated $357 across 97 orchestrator sessions (~71M tokens, priced at API list rates — about $213 of that on Lightwork). Two caveats, cutting in opposite directions: early sessions under-recorded their token counts, so that's an undercount; and it excludes the interactive Claude Code sessions I drove myself. Even doubled, it's a striking number set against everything described above.

Tokens aren't the whole bill, though. Lightwork's CI has logged nearly 14,000 workflow runs, and back-of-envelope math from the run logs puts that at roughly 80,000 GitHub Actions minutes — call it another ~$650 at list price. That's the real cost shape of an autonomous loop: tokens to write the code, CI minutes to prove it — and proving it cost more than writing it.

The honest caveats

If that Tuesday has you tempted, you should know where it hurts first:

It's experimental. The README opens with a warning box, and it's earned. Safety comes from prompt discipline and layered guardrails — issue labels, worktree isolation, a no-hook-bypass rule, human-review gates — not from architectural sandboxing. You give it broad permissions by design.
CI minutes are real money. Parallel workers that retrigger full CI suites can burn through Actions minutes; shipyard now has config knobs specifically to cap that, because I learned the hard way.
Issue quality in, PR quality out. A clean stack trace or a well-scoped ticket gets a good fix. A vague one-liner gets a vague PR or a blocked bail. The leverage shifts your effort toward writing better issues — which, honestly, is effort well spent anyway.
Not everything is auto-fixable. Shipyard returns blocked on work it can't reproduce or safely scope, and parks it for a human. The audits aren't infallible either — some findings get closed as wrong or not worth fixing. That's fine: triaging a written, severity-ranked issue takes seconds. The win is the long tail of well-specified small-to-medium changes that used to sit in the backlog because no human's time ever justified them.
Review is still the gate you make it. Auto-merge respects branch protection, so the gate is whatever you require. On Lightwork I should be explicit about my choice: CI is the merge gate — a green autonomous PR merges without me reading every diff, and I review selectively after the fact, where correctness has real stakes. The test suite is what makes that tenable, and it's a deliberate tradeoff for a solo, pre-launch app. On a team repo you'd keep required reviewers, and shipyard's PRs queue up for humans like anyone else's.

The long tail is the point

Every codebase has a backlog like the one that Tuesday burned down. The 42-pixel tap target. The render-blocking stylesheet. The endpoint nobody rate-limited. The doc that drifted from the code months ago. None of it is hard; none of it is urgent; none of it will ever beat a feature for a human's afternoon. So it sits — not because anyone decided it should, but because attention is the scarcest resource in the building, and that work never wins the auction.

That's the work an engineering loop is for. The unit of coordination is just a GitHub issue, which means the intake side (Sentry, Dependabot, support tooling, audits, humans) and the review side (branch protection, CODEOWNERS, required reviews) are things every team already has. Shipyard slots into the middle and burns down the well-specified work, while the humans keep every decision that actually needs one.

The shipyard repo is public at mattsears18/shipyard — the closed PRs labeled shipyard are the living demo — and the app it helped build lives at getlightwork.com. Installing the plugin is two commands:

claude plugin marketplace add mattsears18/shipyard
claude plugin install shipyard@shipyard

Start on a throwaway repo with --concurrency 1 and a billing alert, per the README. Then write one good issue, run /do-work, and watch a PR show up that you didn't have to shepherd home. That feeling is the whole pitch.