ADR-020 — ADK Agent Architecture Refactor¶

Status: Accepted Date: 2026-05-01 Driver: Post-ADR-019 audit — 7 structural issues identified before any Phase 2 code ships

Note: Files referenced in this ADR have moved: tools/pitwall_bridge.py → src/pitwall/main.py; tools/adk_agents.py → src/pitwall/features/coaching/adk_agents.py.

Context¶

An audit of the ADK system specified in ADR-019 and documented in adk-agent-architecture.md identified seven structural issues that will cause silent failures or performance regressions once Phase 2 code ships. None of them are visible from the API surface — the bridge contract is unchanged — but all of them are cheap to fix now and expensive to fix post-Sonoma when integration tests and field data have accumulated against the broken shapes.

The seven issues, in order of severity:

CoachOrchestrator is an LlmAgent with 17 sub-agents. The LLM must parse ~700 words of routing rules at every invocation and then decide which agent to call. Empirically, LLM-based orchestration mis-routes between semantically similar agents: MindsetCoachAgent vs ProgressTrackerAgent, GoalSettingAgent vs SessionPlannerAgent. These agents share vocabulary and the routing signal degrades as the catalogue grows.
No ParallelAgent used. The debrief pipeline sequences HighlightFinderAgent → TelemetryAgent → PedagogyAgent → NarrativeAgent. The first three agents are data-fetch operations with no dependency on each other. Running them sequentially is approximately 3× slower than running them concurrently. With a 2–15 s paddock latency budget this is directly user-visible.
NarrativeAgent receives paraphrased context. The current design lets the LLM orchestrator summarise what Telemetry and Pedagogy agents found and pass that summary to NarrativeAgent as a string. Every layer of LLM paraphrase is a lossy compression step. NarrativeAgent should consume structured data from session.state, not a natural-language re-telling.
query_pitwall_db accepts raw SQL with no LIMIT enforcement. A confused LLM can generate SELECT * FROM telemetry WHERE session_id = ? against a table with millions of rows. The result set blows the context window and may exceed the model's token limit entirely. The read-only connection prevents writes but does nothing about read volume.
get_track_variance_map imports track_loader but never uses it. The import is a dead reference. On Termux (the target Pixel 10 runtime) track_loader will raise ImportError at module load time, breaking the entire agent module before a single tool call is made.
VoiceScriptAgent generates TTS scripts with no write tool. The agent produces corner-by-corner TTS phrases but has no mechanism to persist them. Every generated script exists only in the LLM's response text and is lost when the conversation turn ends. The audio cache never grows.
Agent descriptions are 150+ words. ADK uses the description field as a routing signal for parent agents and for developer tooling. Verbose descriptions that include examples and caveats degrade routing precision and inflate token usage on every orchestration step.

Decision¶

Fix all seven issues in a single refactor pass before any Phase 2 ADK code is committed to the main branch. The bridge HTTP surface is unchanged. The public entry point PitwallOrchestrator.run(prompt) wraps asyncio.run() identically to the CoachOrchestrator.run() it replaces — callers in pitwall_bridge.py need no changes.

1. Replace `LlmAgent` orchestrator with a `CustomAgent` (`BaseAgent` subclass)¶

PitwallOrchestrator inherits from BaseAgent and implements _run_async_impl. Routing is a Python keyword classifier (_classify_intent), not LLM reasoning. The classifier maps the lowercased query string to one of 15 intent buckets using substring matching, in priority order:

Keywords matched	Intent / agent invoked
`gold`, `aj`, `reference`	`GoldLapAgent`
`fog`, `weather`, `rain`	`WeatherAdaptationAgent`
`plan`, `practice`, `laps`	`SessionPlannerAgent`
`incident`, `moment`, `spike`	`IncidentReviewAgent`
`race`, `stint`, `tyre`	`RacePaceAgent`
`target`, `goal`, `pb`	`GoalSettingAgent`
`consistent`, `variance`	`MentalMapAgent`
`audio`, `script`, `tts`	`VoiceScriptAgent`
`turn N`, `carousel` (corner names)	`CornerCoachAgent`
`vs`, `compare`	`LapComparisonAgent`
`progress`, `faster`, `plateau`	`ProgressTrackerAgent`
`setup`, `understeer`, `feels`	`SetupAdvisorAgent`
`frustrat`, `stuck`	`MindsetCoachAgent`
`debrief`, `today`, `highlight`	`DebriefPipeline`
(default)	`TelemetryAgent`

Routing is evaluated top-to-bottom; first match wins. Extending the catalogue is adding a row to a dict and a new BaseAgent subclass — no LLM routing prompt to maintain.

2. Pre-built pipelines using `SequentialAgent` + `ParallelAgent`¶

Two pipelines replace the flat sequential chain:

DebriefPipeline = SequentialAgent(
    name="DebriefPipeline",
    sub_agents=[
        ParallelAgent(
            name="DataGatherPhase",
            sub_agents=[HighlightFinderAgent, TelemetryAgent, PedagogyAgent],
        ),
        NarrativeAgent,
    ],
)

BriefPipeline = SequentialAgent(
    name="BriefPipeline",
    sub_agents=[PedagogyAgent, NarrativeAgent],
)

DataGatherPhase runs the three data agents concurrently. NarrativeAgent always runs last, after all data is available in session.state. Debrief wall-clock time drops from ~3× sequential to approximately 1× the slowest data agent.

3. `output_key` on each data agent; `NarrativeAgent` reads structured state¶

Each data agent is given an output_key:

HighlightFinderAgent  → output_key="highlights_data"
TelemetryAgent        → output_key="telemetry_data"
PedagogyAgent         → output_key="pedagogy_data"

ADK writes each agent's response to session.state[output_key] automatically. NarrativeAgent's instruction uses template variables:

Context:
Highlights: {highlights_data}
Telemetry:  {telemetry_data}
Pedagogy:   {pedagogy_data}

ADK injects state values before the instruction reaches the LLM. Context flows as structured data, not LLM paraphrase. NarrativeAgent has no output_key — its output is written to conversations via the write_conversation tool.

4. SQL safety wrapper in `query_pitwall_db`¶

Two guards are injected before execution:

LIMIT enforcement. If the SQL string does not contain the word LIMIT (case-insensitive), the wrapper appends LIMIT 500 before executing. No agent can return more than 500 rows regardless of what the LLM generates.
Non-SELECT rejection. If the normalised query does not begin with SELECT, the tool returns a structured error dict rather than executing. This is defense-in-depth — read_only=True at the connection level already blocks writes, but an early rejection with a clear error message is faster and produces better LLM self-correction than a DuckDB exception traceback.

@tool
def query_pitwall_db(sql: str) -> list[dict]:
    """Query pitwall session data. Read-only. Max 500 rows."""
    normalised = sql.strip().upper()
    if not normalised.startswith("SELECT"):
        return [{"error": "Only SELECT statements are permitted."}]
    if "LIMIT" not in normalised:
        sql = sql.rstrip().rstrip(";") + " LIMIT 500"
    conn = duckdb.connect(DB_PATH, read_only=True)
    try:
        return conn.execute(sql).fetchdf().to_dict("records")
    finally:
        conn.close()

5. Remove dead `track_loader` import from `get_track_variance_map`¶

The import from tools import track_loader in get_track_variance_map is deleted. The function never calls anything from that module. On Termux the import raises ImportError at module load time; removing it makes the tool importable on all target platforms.

6. `save_voice_scripts` tool for `VoiceScriptAgent`¶

A new tool persists TTS phrase sets to disk:

@tool
def save_voice_scripts(corner_name: str, phrases: dict[str, str]) -> str:
    """Write TTS phrases for a corner to the audio cache.

    Args:
        corner_name: Identifier for the corner (e.g. "turn_7", "carousel").
        phrases:     Mapping of phase → phrase, e.g.
                     {"brake": "Brake now", "apex": "Roll to apex"}.
    Returns:
        Absolute path of the written file.
    """
    path = AUDIO_CACHE_DIR / f"{corner_name}.json"
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(json.dumps(phrases, indent=2))
    return str(path)

Output path: tools/audio_cache/<corner_name>.json. Format: {"phase": "TTS phrase", ...}. VoiceScriptAgent receives this tool in its tools list. Scripts generated during a session now persist and are available to the in-drive rally co-driver path (see Termux plan in project memory).

7. Agent descriptions capped at ~30 words¶

Every agent description is rewritten to be keyword-rich and distinctive, with no examples, no caveats, and no prose. The complete 17-agent description set is specified in adk-agent-architecture.md (updated alongside this ADR). Routing reliability increases because the signal-to-noise ratio in the description field improves.

Consequences¶

Positive. Routing is now deterministic Python — intent classification cannot degrade as the agent catalogue grows, and mis-routes between similar agents are impossible. The debrief pipeline runs data agents concurrently, cutting wall-clock time by approximately 3×. NarrativeAgent receives structured state rather than paraphrased prose, eliminating lossy summarisation. No single SQL query can blow the context window. Voice scripts persist across sessions and are available to the audio cache consumed by the in-drive path. The dead import that would break Termux module loading is gone.

Negative / risks. _classify_intent is a keyword classifier, not semantic understanding. A novel query that uses none of the registered keywords falls through to TelemetryAgent (the default). This is a safe fallback for the current use case — most paddock queries are telemetry questions — but may need a lightweight embedding classifier post-Sonoma if the /coach/ask catalogue expands significantly.

Unchanged. The bridge HTTP surface is identical. PitwallOrchestrator.run(prompt) wraps asyncio.run(_run_async_impl(...)) and returns the same {text, emotion} dict that CoachOrchestrator.run() would have returned. RuleCoach, CoachArbiter, and LitertCoach are untouched.

References¶

ADR-017 — three-tier latency split
ADR-019 — original ADK topology this ADR revises
adk-agent-architecture.md — full agent topology, tool specs, updated per this ADR

ADR-020 — ADK Agent Architecture Refactor¶

Context¶

Decision¶

1. Replace LlmAgent orchestrator with a CustomAgent (BaseAgent subclass)¶

2. Pre-built pipelines using SequentialAgent + ParallelAgent¶

3. output_key on each data agent; NarrativeAgent reads structured state¶

4. SQL safety wrapper in query_pitwall_db¶

5. Remove dead track_loader import from get_track_variance_map¶

6. save_voice_scripts tool for VoiceScriptAgent¶