Module 15 of 18

Sub-Agents and Parallel Pipelines

Spawn multiple agents working in parallel — and build the research pipeline that scales beyond a single session.

What you'll learn

Explain how Co-Work sub-agents work and identify the current mechanism: Claude Code Remote sessions

Design a parallel task execution pipeline for a real multi-document research or analysis problem

Build a parallel analysis pipeline that processes 3+ items simultaneously and synthesizes results

Evaluate the tradeoffs between parallel sub-agent execution and sequential task execution — including token cost implications

The Architecture Ceiling — and the Way Through It

Parent Co-Work session at top, spawning 3-5 Claude Code Remote sessions, each processing one item, results returning to parent for synthesis

Every capability you have built so far runs in a single session: one context window, processing work sequentially. That is fine for most tasks. But some problems are fundamentally parallel in nature — a collection of documents that need individual analysis, a queue of bugs that each need a fix, a set of market segments that each need a research brief.

For these problems, sequential execution is not just slow — it is architecturally wrong. The items do not depend on each other. There is no reason item 3 should wait for item 2 to finish. Sub-agents change the equation.

Source Note: Claude Code Remote

Claude Code Remote is the current sub-agent mechanism — both Jenny (Anthropic) and Swyx have described this pattern in detail. It is not yet in Co-Work's official scheduling documentation and may evolve as the platform matures. The sub-agent capability itself is verified in 01-cowork-overview.md; the Claude Code Remote mechanism is attributed to these practitioners and may change.

How Sub-Agents Work

Co-Work can spawn sub-agents to run tasks in parallel — multiple instances working simultaneously. This is documented in official Co-Work documentation as "sub-agent coordination" and "parallel workstreams."

The current mechanism, described by Jenny (Anthropic) and Swyx in technical discussions, uses Claude Code Remote as the sub-agent engine. The architecture works like this:

The parent Co-Work session identifies the collection of parallel work items
It writes a template prompt describing what to do with each item
It spawns a separate Claude Code Remote session for each item
Each session runs independently — accessing files, using tools, producing output
The parent Co-Work session monitors all running sessions, collects their outputs, and synthesizes a final result

The key constraint: each session is independent. Sub-agents cannot communicate with each other mid-run. They receive their prompt from the parent, do their work, and return results to the parent. This means sub-agents are suited for tasks where the items do not depend on each other's intermediate outputs.

Parallel vs. Sequential: The Design Decision

Timeline diagram: Sequential (item 1 then item 2 then item 3, 3x time) vs Parallel (items 1, 2, 3 simultaneously, 1x time)

The choice between parallel and sequential execution is an architectural decision, not a performance preference. Use parallel execution when:

Tasks are independent — item B does not need item A's output before it can start
Tasks follow a common template — the same prompt works for every item in the collection
The collection is uniform — all items are of the same type (all documents, all bugs, all interviews)
Speed matters and cost is acceptable — parallel execution finishes faster but costs proportionally more in tokens

Use sequential execution when:

Task B uses task A's output — the work is a chain, not a collection
Items are heterogeneous and require different prompts
You need each result reviewed before proceeding to the next
Context from earlier items should inform later ones

The Bug Triage Pipeline

Crash reporting tool feeds into Co-Work which writes markdown prompts, spawns Claude Code Remote sessions per bug, each creates a PR, parent summarizes all

The most technically sophisticated pipeline documented by the community comes from Swyx, describing a bug triage workflow. It demonstrates what sub-agents enable at scale:

Co-Work monitors a crash reporting tool (such as Sentry) via a connector
For each new bug report: Co-Work writes a markdown prompt file describing the issue, the relevant code context, and the expected behavior
Co-Work spawns a separate Claude Code Remote session for each bug
Each session reads its prompt, pulls the relevant code, attempts a fix, and creates a pull request
The parent Co-Work session monitors all sessions, collects the PR links, and generates a summary report of all attempted fixes

The prompt that triggers this: "Go to [crash reporting tool]. Find all bugs filed today. For each bug, start a Claude Code Remote session, have it read the issue description, find the relevant code, attempt a fix, and create a pull request."

Community Pattern — Not in Official Docs

The bug triage pipeline is a pattern described by Swyx in a community discussion (Future of Software). It is an advanced community practice, not from official Anthropic documentation. The capability it describes — sub-agent coordination with parallel workstreams — is verified. The specific implementation pattern is community-sourced and attributed to Swyx.

The Jenny Monday Insights Pattern

Jenny, a researcher at Anthropic, describes a parallel research pattern she uses in her own work. Each Monday morning, she asks Co-Work to analyze a folder of UXR (user experience research) interview transcripts alongside recent Reddit discussions about the product. Co-Work processes multiple sources simultaneously, each as a separate analysis stream, and synthesizes them into a unified insights brief.

The structure: "Look in this folder of UXR interviews and on Reddit — tell me the main insights." What sounds like a simple request triggers parallel processing across multiple sources. Without sub-agents, this would take hours and a serial reading of each source. With parallel execution, the consolidated brief arrives while the first coffee is still hot.

Token Economics in Parallel Workloads

Parallel execution is faster but not cheaper. Five parallel sub-agents consume approximately five times the tokens compared to running the same work sequentially. The wall-clock time drops by a factor of five; the token cost does not.

Jack Roberts uses a framing that clarifies the decision: think of token spend as employee cost. Running five sub-agents in parallel is like hiring five specialists to each work on one problem for an hour — versus hiring one specialist to work on all five problems for five hours. Same total labor cost, very different delivery speed.

Monitor Cost Before Scaling Parallel Pipelines

Parallel agents multiply token consumption proportionally. Before running a 20-item parallel pipeline, test with 3 items first and review the cost. Set clear expectations about spend — especially for scheduled pipelines that run automatically. The OpenTelemetry monitoring covered in Module 17 enables cost tracking per pipeline at Team/Enterprise scale.

Designing Your First Parallel Pipeline

The design pattern for any parallel pipeline follows four steps:

Identify the parallelizable unit — what is the "one item" that the template prompt operates on? One document, one bug, one customer interview, one market segment.
Write the template prompt — one prompt that works for any item in the collection. Test it on a single item first. If it does not work on one, it will not work on twenty.
Define the synthesis — how should the parent Co-Work session combine all the individual outputs? A consolidated summary? A comparison table? A ranked list? Define this in the prompt.
Set the scope limit — start with 3–5 items in the first run. Verify the outputs are correct before scaling. A bad template prompt running on 20 items in parallel produces 20 bad outputs.

Build-Along Exercise

Build a Parallel Research Pipeline

Choose 3–5 articles, documents, or URLs that share a common theme. This could be competitor blog posts, industry news pieces, research papers, or user feedback files. The items should be similar enough that one analysis prompt applies to all of them.

Select 3–5 parallel items. They should be the same type (all articles, all documents, all URLs) and independently analyzable — reading one does not require having read another first.

Write the template prompt. Draft one analysis prompt that applies to any item in your collection. For example: "Analyze this article and identify: (1) the main argument in one sentence, (2) three supporting claims, (3) one notable quote, (4) the target audience." Test it on a single item before going parallel.

Run the parallel pipeline. Tell Co-Work: "I have [N] documents/articles: [list them or reference the folder]. For each one, run this analysis: [paste your template prompt]. Analyze all of them simultaneously and give me a consolidated summary at the end."

Review the synthesis. Verify each item was analyzed individually. Check that the consolidated summary correctly draws on all individual analyses. Note where the synthesis is useful — and where it loses important nuance.

Estimate the time comparison. How long would sequential analysis of these items take? How long did parallel take? The ratio is your parallel pipeline's time-to-value case.

Success criteria: At least 3 items analyzed in parallel with individual results and a consolidated synthesis produced. Time comparison documented. Template prompt ready to reuse for the next collection of the same type.

Knowledge Check

I understand that Claude Code Remote is the current sub-agent mechanism — described by Jenny (Anthropic) and Swyx; verified capability but mechanism may evolve

I have designed a pipeline that correctly identifies the parallelizable unit and writes a template prompt that works for any item in the collection

I have run at least one parallel task across 3+ items and reviewed the consolidated synthesis output

I understand the token cost tradeoff: parallel execution is faster but costs proportionally more — I test with 3 items before scaling to 20

I can distinguish tasks suited for parallel execution (independent, uniform, no shared dependencies) from those requiring sequential execution (chained outputs, heterogeneous items)

← Advanced Skills Safety & Compliance →