Trigger: Paul 2026-04-25 10:45 PDT — "can we please build in a cleaner method for handling all of these package.json conflicts? Am I wrong or is it like the most offending conflicting file? ... I'm looking for a more robust solution which will enable more concurrency not limit or hinder it."
Paul's right. Hard data from the last day on dev:
| Rank | File | Commits touching it (last 24h) |
|---|---|---|
| 1 | package.json | 34 |
| 2 | functions/api/chat/tool-schemas.ts | 7 |
| 3 | wrangler.toml | 6 |
| 3 | src/components/ChatDrawer.jsx | 6 |
| 3 | functions/api/chat/tools-retrieval.ts | 6 |
| 3 | functions/api/chat/tools-render.ts | 6 |
| 3 | functions/api/chat/system-prompt.ts | 6 |
package.json is touched 5x more than the next file. And the conflict is concentrated on a single field:
scripts.test = "<7,674-char string of 150 commands && && && ...>"
Every PR that adds a test command appends to this single line. Two parallel PRs both modify the same line of the same file → guaranteed conflict on every overlap.
The &&-chain is a degenerate one-line append target. Three structural problems compound:
1. No DAG / parallelism — even when CI runs, every test runs sequentially via &&. We don't even get the speedup from running tests in parallel; we just get the conflict cost.
2. No discovery — every new test must be manually added to the chain. This is what makes parallel PRs collide.
3. No package boundaries — Firmwatch is single-package (firmwatch-build); there's no apps/foo/package.json we could move tests into.
Consensus from the research (Vercel, Stripe, Linear, Google/Bazel, Meta/Buck, Shopify, Strapi): the "monolithic test chain" pattern doesn't scale. Every high-velocity team uses discovery-based test runners + DAG orchestration + per-package boundaries.
I'm giving you the full spectrum from "5-min hack" to "real architectural fix" so you can pick where you want to land.
What: register a git merge-driver for package.json that runs jq (or a small Node script) to do semantic union merge on the scripts object. The driver knows that scripts with identical keys + values merge cleanly; only true conflicts (same key, different value) need human resolution.
# .gitattributes
package.json merge=json-union
# .git/config (per-clone or via a script)
[merge "json-union"]
driver = node scripts/json-union-merge.js %O %A %B %P
Pros: surgical, no new deps, no test-runner change, deploys in an hour, ~50% of current conflicts go away (the append-to-test-chain ones become auto-merged).
Cons: still a 7,674-char one-liner. Still slow (sequential tests). Doesn't help when two PRs add the SAME script key (rare but possible).
Net: the right immediate fix. We were going to do this anyway via FIRM-261.
What: stop appending to scripts.test. Replace with a one-liner:
"test": "node scripts/run-all-tests.mjs"
Then run-all-tests.mjs glob-discovers tests from filesystem patterns — scripts/test_.js, scripts/test_.mjs, tests/*/.test.{ts,jsx}, etc.
New tests just need a new file with the right name pattern. No package.json edit.
Pros: package.json scripts.test becomes static. Conflicts on it disappear almost entirely. Most other scripts.* entries (smoke, canary, render-tool-specific) are also rarely-touched. Pure mechanical change.
Cons: requires teaching the orch's spec authoring step + Strata's "always-add-test-cmd" pattern to NOT append to package.json — instead, just create the test file with the right name. Ticket-template change. Also: discovery order may differ from current explicit ordering (test_* alphabetical instead of "added in this order"). Probably fine, but verify.
Net: the highest-ROI fix. Most conflict pain goes away with one config refactor + a docs update.
What: adopt vitest as the test runner. Configure workspace globs in vitest.config.ts. Tests discovered by file-path patterns, no manual chain.
Pros: real parallelism (vitest runs tests concurrently natively), watch mode, snapshot support, modern test ergonomics. Plus the auto-discovery benefit of Option 2.
Cons: ~120 of our 150 tests are written as plain Node scripts (node scripts/test_X.js style), some npx tsx, some python3. Migrating them all to vitest's expect/describe/it API is real work — call it ~3 hours per 10 tests = ~45 hours to migrate everything. Or we keep the Node-script tests on a parallel track and use vitest only for the new tests, gradual migration. Hybrid is probably the right path.
Cons cont'd: does NOT solve the multi-language issue (we have python3 + tsx + node tests). Vitest is JS/TS only.
Net: the right v2 direction but probably not the immediate fix. Migrate gradually.
What: declare turbo.json with a pipeline.test task. Turborepo runs tests in parallel based on dependency-graph + caching. Per-package package.json files in packages/foo/, apps/firmwatch-pages/, etc. Root package.json becomes the orchestrator.
Pros: what Vercel themselves use. Native CI caching (re-running unchanged tests is a no-op). Per-package conflict isolation — adding a test to packages/scrapers/package.json doesn't conflict with adding one to packages/web/package.json. Real parallelism.
Cons: requires restructuring the repo into packages — splitting functions/, src/, scripts/, e2e/ along package boundaries. That's a real refactor (~3-5 days of careful work). Risk of breaking deploys, smoke tests, the orchestrator, etc.
Net: the right v3 direction once we feel like the project's complexity warrants monorepo discipline. Today probably premature.
What: GitHub native merge queue serializes PR merges + auto-rebases each PR onto the latest dev tip + runs CI before merging. Eliminates the "auto-merge enabled but stuck on conflict" class entirely because each PR is rebased before being attempted.
Pros: GitHub-native, no orch changes, eliminates the auto-merge race wholesale.
Cons: forced sequencing — ironically reduces concurrency at the merge step (only one PR merges at a time per queue). Adds 2-5 min latency per PR. Plus our PRs come from non-trunk branches and many touch package.json — the queue would still hit auto-rebase failures and we'd need a rebase strategy underneath it.
Net: would mostly eliminate Class 1 (auto-merge race) but doesn't solve the underlying package.json contention. Best paired with Option 1 or 2.
| Goal | Build | Cost | Risk | Reduces conflicts by |
|---|---|---|---|---|
| Fastest visible improvement | Option 1 (JSON merge driver) | 1-2 hours | Low | ~50% |
| Best ROI | Option 1 + Option 2 (driver + discovery) | 1 day | Low | ~85-90% |
| Long-term scalability | Option 1 + Option 2 + gradual Option 3 (vitest migration) | Multi-week | Medium | ~95%+ |
| Architectural completeness | All of the above + Option 4 (turborepo) | 1-2 weeks | Medium-high | ~99% |
Build Option 1 + Option 2 NOW. Skip Option 3/4 until they earn their way in.
Reasoning:
Spec 037 (filed 30 min ago) handles the symptom (auto-merge race + zombie coordinators); this brief proposes the root-cause fix for what's CAUSING the high conflict rate.
We've been bitten by "too clever" merge resolution before — yesterday's Claude Code that hallucinated a "deep API incompatibility" between two near-identical FeedbackCard files (FIRM-526). The risk with Option 1 is the JSON merge driver gets it wrong on a real conflict and silently picks a wrong branch.
Mitigation: the driver should be conservative — it auto-resolves ONLY when both sides added different keys to the same object. If both sides modified the SAME key with different values, it falls through to standard conflict markers and humans handle it.
For Option 2, the risk is that some test was relying on its position in the chain (e.g., setup test must run first). Need a one-shot audit. For our current tests, there's no inter-test ordering dependency I'm aware of — everything is independent.
If you say go, this becomes spec 038. Tickets:
.gitattributes + ~50 LOC Node script + .git/config registration via setup script for new clones)run-all-tests.mjs glob-discovery script + replace package.json scripts.test with single-line invocationpackage.json test field to the new orchestrator script call (one-shot)5-6 tickets, 1-2 layers, all sonnet, ~4-6h wall-clock.
Want me to file?