ADK Agent Architecture¶

Companion to ADR-019, ADR-020, ADR-021, and ADR-022.

Current state (2026-05-26): All phases implemented. Backend selector landed 2026-05-12 — paddock ADK speaks to LocalLLM, an Apache-2.0 Android APK (github.com/mlnomadpy/localllm) that hosts LiteRT-LM and exposes an OpenAI-compatible HTTP server. On the desktop default it listens on 127.0.0.1:8099/v1; the phone deploy (deploy/phone/70-start-bridge.sh) sets PITWALL_ADK_OPENAI_URL to http://localhost:8080/v1 to match LocalLLM's Android default port. Two legacy paths (lit serve via Gemini(base_url=...), in-process LitertLmModel) remain available and are byte-identical from the agents' point of view.

Phone install gap (2026-05-26). google-adk does not install on Termux aarch64 — there are no android_arm64_v8a wheels for cffi, cryptography, watchdog, or pydantic-core, and adb shell su <uid> doesn't have working DNS, so pip install from PyPI fails outright on the phone. 16 MB of pre-staged ADK wheels live at ~/adk-wheels on the Pixel for an offline install once Termux's DNS is back. Until then, every ADK-backed endpoint (/coach/ask, /coach/agents, /coach/traces) returns honest {available: false, reason: "google-adk not installed"}.

Guiding constraints¶

ADK never touches the hot path. In-drive coaching (< 100 ms) stays as RuleCoach + CoachArbiter. ADK is paddock-only.
DuckDB writes belong to the bridge. All ADK tools query DuckDB read_only=True. The bridge is the sole writer — except save_voice_scripts, which writes to tools/audio_cache/ (JSON files, not DuckDB).
LocalLLM is the sole paddock LLM transport — never a hosted API. Per ADR-024 (superseding ADR-022), every ADK call goes through LiteLlm to LocalLLM on 127.0.0.1:8099/v1. google-adk and litellm are base dependencies of apps/edge-daemon — the bridge fails to start without them. The earlier PITWALL_ADK_BACKEND selector and its engine / litertlm branches are retired.

LocalLLM is a separate Apache-2.0 Android APK that hosts LiteRT-LM in a native Android process with GPU/NPU delegate access via LiteRT's AUTO backend, owns model lifecycle through its in-app catalog, and speaks OpenAI's chat.completions with SSE streaming and signed-bearer-token auth. The same LiteLlm client also covers Ollama / LM Studio / llama.cpp / vLLM on dev workstations — point PITWALL_ADK_OPENAI_URL at them and nothing else changes.

The warm path is on the same transport. LitertCoach.brief() and LitertCoach.debrief() (coach_engine.py) post directly to LocalLLM's /v1/chat/completions via stdlib urllib.request per ADR-025. The in-process litert_lm.Engine warm-path branch was retired; warm and paddock share one transport contract.

One transport, three tiers:

Path	Runtime	Model	How invoked	Latency budget
In-drive (hot)	`RuleCoach` + canonical phrases	—	Pure Python	< 100 ms
Warm (brief / debrief)	LocalLLM (HTTP → `127.0.0.1:8099/v1`)	Gemma 4 E2B	`urllib.request` POST	2–4 s brief / 8–15 s debrief
Paddock (ADK)	LocalLLM via `LiteLlm` (HTTP → `127.0.0.1:8099/v1`)	Gemma 4 E4B / E2B	`LiteLlm(model="openai/<id>", api_base=...)`	2–15 s

Routing is deterministic Python. PitwallOrchestrator uses _classify_intent() — a keyword classifier — not LLM routing. This eliminates mis-routing between similar agents.
SQL queries are bounded. query_pitwall_db enforces LIMIT 500 and rejects non-SELECT. No agent can blow the context window via a table scan.
All agent calls go through run_adk(). BaseAgent has no .run() shortcut; the canonical path is Runner.run_async() wrapped in asyncio.run().
Persistent sessions for KV cache reuse. InMemorySessionService sessions are reused per driver within a process lifetime. lit serve clones the KV cache across turns — session system instructions only prefill once per driving day.

Full topology¶

src/pitwall/__main__.py (Flask, sync)
    │
    │  run_adk(prompt, user_id)          ← _get_or_create_session(user_id)
    │  _drain_adk_traces(pitwall_sid)    ← get_pending_traces() → agent_traces DuckDB
    │  _reset_adk_session(driver_id)     ← called by POST /session/start
    │
    ▼
Runner(PitwallOrchestrator, InMemorySessionService, plugins=[PitwallTracingPlugin])
    │
    ▼
PitwallOrchestrator(BaseAgent)
    _classify_intent(query) → deterministic keyword routing
    │
    ├── "debrief"  →  DebriefPipeline (SequentialAgent)
    │                   ├── DebriefDataPhase (ParallelAgent)  ← 3× concurrent
    │                   │     ├── HighlightFinderAgent  output_key=highlights_data
    │                   │     ├── TelemetryAgent        output_key=telemetry_data
    │                   │     └── PedagogyAgent         output_key=pedagogy_data
    │                   └── NarrativeAgentDebrief       reads {highlights_data}
    │                                                        {telemetry_data}
    │                                                        {pedagogy_data}
    │
    ├── "brief"    →  BriefPipeline (SequentialAgent)
    │                   ├── PedagogyAgent                output_key=pedagogy_data
    │                   └── NarrativeAgentBrief          reads {pedagogy_data}
    │
    └── QA intent  →  single specialist agent (14 paths)
          gold_lap       → GoldLapAgent
          weather        → WeatherAdaptationAgent
          session_plan   → SessionPlannerAgent
          incident       → IncidentReviewAgent
          race_pace      → RacePaceAgent
          goal           → GoalSettingAgent
          mental_map     → MentalMapAgent
          voice_script   → VoiceScriptAgent
          lap_comparison → LapComparisonAgent
          corner         → CornerCoachAgent
          progress       → ProgressTrackerAgent
          setup          → SetupAdvisorAgent
          mindset        → MindsetCoachAgent
          agent_meta     → AgentMetaAgent
          telemetry      → TelemetryAgent  (default)

PitwallOrchestrator¶

BaseAgent subclass at src/pitwall/features/coaching/adk_agents.py. _run_async_impl(ctx) reads ctx.user_content.parts[0].text, calls _classify_intent(), then async for event in pipeline.run_async(ctx): yield event.

Intent classifier — _classify_intent(query: str) -> str — evaluates keywords top-to-bottom, first match wins. Order matters: whole-flow intents (debrief, brief, voice_script) precede corner so that "brief me on T6" routes to BriefPipeline, not CornerCoachAgent (audit fix 2026-05-12).

#	Keywords matched	Intent	Agent / pipeline
1	`debrief`, `how did i do`, `session summary`, `review my session`	`debrief`	DebriefPipeline
2	`brief`, `pre-session`, `before i go out`, `today's plan`	`brief`	BriefPipeline
3	`voice script(s)`, `cue script(s)`, `tts`, `pace note(s)`, `audio cue(s)`, `generate cue/voice/audio …`	`voice_script`	VoiceScriptAgent
4	`TN` / `turn N`, `carousel`, `bus stop`	`corner`	CornerCoachAgent
5	`gold lap`, `reference lap`, `AJ`	`gold_lap`	GoldLapAgent
6	`weather`, `fog`, `conditions`, `greasy`, `track temp`	`weather`	WeatherAdaptationAgent
7	`practice plan`, `laps available`, `i have N laps`	`session_plan`	SessionPlannerAgent
8	`incident`, `close call`, `scary`, `saved it`, `moment at`	`incident`	IncidentReviewAgent
9	`race pace`, `stint`, `degradation`, `tyre drop`	`race_pace`	RacePaceAgent
10	`pb target`, `lap time goal`, `target lap`, `set me a goal`	`goal`	GoalSettingAgent
11	`variance`, `consistency`, `consistent`, `inconsistent`, `mental map`, `repeatable`, `repeatability`, `stable`	`mental_map`	MentalMapAgent
12	`lap N vs`, `compare lap`, `why was lap`, `fastest vs slowest`	`lap_comparison`	LapComparisonAgent
13	`progress`, `improving`, `getting faster`, `over sessions`	`progress`	ProgressTrackerAgent
14	`setup`, `understeer`, `oversteer`, `balance`, `car feel`	`setup`	SetupAdvisorAgent
15	`frustrated`, `plateau`, `not working`, `motivation`	`mindset`	MindsetCoachAgent
16	`slowest`/`latency`/`tool call`/`agent trace` + `agent`	`agent_meta`	AgentMetaAgent
	(default)	`telemetry`	TelemetryAgent

Routing escape hatch. POST /coach/ask accepts an optional intent field (bp_coaching.py) that bypasses _classify_intent() entirely. The orchestrator reads temp:intent_override from session state and falls back to the regex classifier only when the override is empty or not in _VALID_INTENTS. Use this when the natural-language router would misroute and you know which agent should run. Valid intents are the keys of the table above.

Pipelines¶

DebriefPipeline¶

_debrief_data_phase = ParallelAgent(
    name="DebriefDataPhase",
    sub_agents=[highlight_finder_agent, telemetry_agent, pedagogy_agent],
)
debrief_pipeline = SequentialAgent(
    name="DebriefPipeline",
    sub_agents=[_debrief_data_phase, _narrative_debrief],
)

Three data agents run concurrently. Each writes to session.state via output_key. NarrativeAgentDebrief runs after all three complete — wall-clock time is 1× the slowest data agent, not the sum.

BriefPipeline¶

brief_pipeline = SequentialAgent(
    name="BriefPipeline",
    sub_agents=[pedagogy_agent, _narrative_brief],
)

PedagogyAgent runs first, writes pedagogy_data to session state. NarrativeAgentBrief generates the pre-session brief from that structured context.

Narrative agent instances¶

_narrative_debrief and _narrative_brief are separate Agent instances with identical instruction templates. Separate instances prevent session-state bleed if requests overlap. Both share the same template:

Session highlights: {highlights_data}
Telemetry analysis: {telemetry_data}
Pedagogy context:   {pedagogy_data}

[Output format rules + EMOTION tag instruction]

A third instance narrative_agent is used for QA paths.

Agent catalogue¶

23 specialist agents exposed via AGENT_REGISTRY (/coach/agents).

Foundation QA agents (15, ADR-019/020/021) — TelemetryAgent, LapComparisonAgent, CornerCoachAgent, ProgressTrackerAgent, SetupAdvisorAgent, MindsetCoachAgent, GoldLapAgent, WeatherAdaptationAgent, SessionPlannerAgent, IncidentReviewAgent, RacePaceAgent, GoalSettingAgent, MentalMapAgent, VoiceScriptAgent, AgentMetaAgent.

Phase-2 AiM-aware specialists (6, 2026-05-28) — each owns one AiM signal domain end-to-end and publishes its findings under a named output_key that the brief / debrief narrative templates cite:

TireManagerAgent (tire_data) — TPMS pressure / temperature window
alarms. Cold→hot delta, per-corner balance, cold-pressure target advice.
HandlingBalanceAgent (handling_data) — measured understeer / oversteer per corner via yaw rate × steering bicycle model (E46 M3 wheelbase 2.731 m, ratio 15.4:1). Surfaces YAML §7 sign-convention warning.
EngineHealthAgent (engine_health_data) — S54 vitals: oil-pressure floor under load, coolant / oil temp drift, fuel pressure under brake. Sentinel-aware (drops AiM 0xFFFF no-reading marker).
TractionAgent (traction_data) — wheelspin / lockup events from per-wheel speed deltas, attributed to corner segments.
InputQualityAgent (input_quality_data) — steering oscillation, throttle modulation rate, brake-release shape; 0–100 smoothness score.
SafetyMonitorAgent (safety_data) — ABS / DSC / MIL / TPMS alarm timeline. Explains pace drops via active safety events.

Pipeline-only data agents (2) in the registry but never routed by _classify_intent — they run inside DebriefPipeline / BriefPipeline: HighlightFinderAgent, PedagogyAgent.

Additional internal Agent instances exist and are NOT in the registry: NarrativeAgentBrief, NarrativeAgentDebrief, the 9 pipeline copies of the data agents (TelemetryAgentDebrief, HighlightFinderAgentDebrief, PedagogyAgentDebrief, TireManagerAgentDebrief, etc.), and 3 brief-side copies (PedagogyAgentBrief, TireManagerAgentBrief, EngineHealthAgentBrief, SafetyMonitorAgentBrief). They share identical instruction templates with their siblings; separate instances prevent session-state bleed across overlapping requests and satisfy ADK's single-parent invariant inside the parallel data phases.

All agents share the same _model symbol set at module-load — a LiteLlm(model="openai/<id>", api_base=..., api_key=...) instance dialling LocalLLM. See Model transport below.

Pipeline data agents (with `output_key`)¶

Agent	`output_key`	Tools
`TelemetryAgent`	`telemetry_data`	`query_pitwall_db`, `get_session_highlights`, `get_safety_events`
`HighlightFinderAgent`	`highlights_data`	`get_session_highlights`, `get_input_smoothness`, `get_tire_thermal_window`, `query_pitwall_db`
`PedagogyAgent`	`pedagogy_data`	`query_pitwall_db`
`TireManagerAgent`	`tire_data`	`get_tire_thermal_window`, `query_pitwall_db`
`HandlingBalanceAgent`	`handling_data`	`get_handling_balance`, `query_pitwall_db`
`EngineHealthAgent`	`engine_health_data`	`get_engine_health_timeline`, `query_pitwall_db`
`TractionAgent`	`traction_data`	`get_traction_events`, `query_pitwall_db`
`InputQualityAgent`	`input_quality_data`	`get_input_smoothness`, `query_pitwall_db`
`SafetyMonitorAgent`	`safety_data`	`get_safety_events`, `query_pitwall_db`
`NarrativeAgentDebrief` / `NarrativeAgentBrief`	(none)	(none)

QA specialist agents¶

Agent	Tools
`LapComparisonAgent`	`get_lap_delta`, `get_engine_health_timeline`, `get_safety_events`, `query_pitwall_db`
`CornerCoachAgent`	`get_corner_history`, `get_handling_balance`, `get_traction_events`, `query_pitwall_db`
`ProgressTrackerAgent`	`get_progress_report`, `query_pitwall_db`
`SetupAdvisorAgent`	`get_setup_indicators`, `get_handling_balance`, `get_input_smoothness`, `get_tire_thermal_window`, `query_pitwall_db`
`MindsetCoachAgent`	`get_progress_report`, `get_corner_history`, `query_pitwall_db`
`GoldLapAgent`	`get_gold_lap_comparison`, `query_pitwall_db`
`WeatherAdaptationAgent`	`get_weather_adaptation_context`, `get_tire_thermal_window`, `query_pitwall_db`
`SessionPlannerAgent`	`get_session_plan_context`, `get_tire_thermal_window`, `query_pitwall_db`
`IncidentReviewAgent`	`get_incident_moments`, `get_safety_events`, `get_traction_events`, `query_pitwall_db`
`RacePaceAgent`	`get_race_pace_model`, `get_engine_health_timeline`, `get_tire_thermal_window`, `query_pitwall_db`
`GoalSettingAgent`	`get_goal_targets`, `get_progress_report`, `query_pitwall_db`
`MentalMapAgent`	`get_track_variance_map`, `query_pitwall_db`
`VoiceScriptAgent`	`get_audio_script_context`, `save_voice_scripts`, `query_pitwall_db`
`AgentMetaAgent`	`get_agent_telemetry`

Brief / debrief narrative-slot map¶

The narrative templates read these output_key slots when assembling the final brief / debrief text. Empty slots collapse to "" via ADK's {key?} optional binding — pipelines stay valid when a data agent has nothing to say (e.g. no safety events on a clean session).

{highlights_data?}    ← HighlightFinderAgentDebrief
{telemetry_data?}     ← TelemetryAgentDebrief
{pedagogy_data?}      ← PedagogyAgent{Brief,Debrief}
{tire_data?}          ← TireManagerAgent{Brief,Debrief}
{handling_data?}      ← HandlingBalanceAgentDebrief    (debrief only)
{engine_health_data?} ← EngineHealthAgent{Brief,Debrief}
{traction_data?}      ← TractionAgentDebrief           (debrief only)
{input_quality_data?} ← InputQualityAgentDebrief       (debrief only)
{safety_data?}        ← SafetyMonitorAgent{Brief,Debrief}

DebriefDataPhase = ParallelAgent over 9 data agents. BriefDataPhase = ParallelAgent over 4 (pedagogy + tire + engine + safety — the domains that carry from the prior session into today's pre-brief).

Tools specification¶

All 21 tools live in apps/edge-daemon/pitwall/adk_tools.py. All decorated with @_adk_tool (an identity passthrough — ADK 1.32 registers tools by being passed into Agent(tools=[...]) directly).

Phase-2 AiM-aware tools (added 2026-05-28)¶

get_tire_thermal_window(session_id) — per-corner TPMS pressure / temperature window + alarm bitfield timeline (air leak / low temp / sensor fail). Source: telemetry_signals JOIN signal_registry on tpms_press_*_bar, tpms_temp_*_c, tpms_alm_*.
get_handling_balance(session_id, corner_name?) — measured vs. expected yaw rate per corner via bicycle model with E46 M3 constants. Folds cumulative distance via % TRACK_LENGTH_M. Flags YAML §7 sign-convention warning when counter-steer events > 50 % of samples.
get_engine_health_timeline(session_id) — oil / water / fuel pressure + oil temp aggregates and anomaly markers (oil-pressure starvation under RPM > 3000 + throttle > 10 %, coolant > 105 °C). Sentinel-aware: drops 0xFFFF markers (psi → 4519 bar after scale per YAML §8).
get_traction_events(session_id, slip_threshold_kmh=5.0) — wheelspin (rear-axle > front-axle under throttle) and lockup (body > front-axle under brake) events, attributed to corner segments via lap-modulo distance.
get_input_smoothness(session_id) — steering oscillation (frame-to-frame stddev), throttle modulation rate (mean |Δ throttle|), median brake-release delta, brake-release event count, 0–100 smoothness score, verdict (smooth / competent / choppy).
get_safety_events(session_id) — ABS / DSC / MIL / brake-switch / TPMS alarm timeline. Returns ordered events with first-occurrence timestamp and total count.

`query_pitwall_db(sql)`¶

Read-only DuckDB query. Safety layer: rejects non-SELECT, auto-injects LIMIT 500. Tables: laps, telemetry, coaching_notes, telemetry_signals, sessions, driver_events, llm_friction, conversations, agent_traces.

`get_lap_delta(session_id, lap_a, lap_b)`¶

Frame-by-frame delta between two laps: time, speed, coast pct.

`get_corner_history(driver_id, corner_name, n_sessions=10)`¶

Grade history + coaching notes + improvement trend for one corner across N sessions.

`get_progress_report(driver_id, n_sessions=10)`¶

Multi-session arc: lap time trend, improving/regressing/stable corners, plateau detection.

`get_setup_indicators(session_id)`¶

Telemetry patterns indicating car balance issues: coast ratio, steer oscillation, brake pressure.

`get_session_highlights(session_id)`¶

Best lap, peak grip moment, coaching note counts, worst coast lap.

`get_gold_lap_comparison(session_id)`¶

Driver's best lap vs AJ's gold standard. Corner-by-corner speed gap + lap-time leverage weights.

`get_weather_adaptation_context(hour_local)`¶

Sonoma's 4 weather phases → concrete line, braking, and tyre warm-up advice per corner.

`get_session_plan_context(driver_id, n_laps=10)`¶

Weakest corners + leverage weights → structured N-lap practice plan data.

`get_incident_moments(session_id, combo_g_threshold, steer_spike_threshold)`¶

Over-limit grip events, emergency brakes, steering saves from telemetry.

`get_race_pace_model(session_id)`¶

Lap degradation model: quali pace, race pace median, consistency score, degradation s/lap.

`get_goal_targets(driver_id)`¶

Realistic PB targets from improvement rate. Top 3 corners by (100 - score) × leverage.

`get_track_variance_map(session_id)`¶

Corner-by-corner speed variance from telemetry. High CV = inconsistent.

`get_agent_telemetry(n_recent=50)`¶

Queries agent_traces table: slowest agents by avg latency, top tools by call count, recent trace rows.

`get_audio_script_context(corner_name, driver_level)`¶

Returns corner tip, leverage pct, TROD voice examples, and script guidelines for VoiceScriptAgent.

`save_voice_scripts(corner_name, scripts)`¶

Writes generated TTS phrases to tools/audio_cache/<corner>.json. Uses fcntl.flock(LOCK_EX) + os.replace() for atomic concurrent writes.

Model transport¶

Per ADR-024 (superseding ADR-022) the paddock tier has a single transport — LiteLlm dialling LocalLLM on 127.0.0.1:8099/v1. There is no env-selectable alternative; reaching a different OpenAI-compatible server (Ollama, LM Studio, llama.cpp --server, vLLM, …) is done by pointing PITWALL_ADK_OPENAI_URL at it.

# adk_agents.py — module load (ADR-024)
_MODEL_ID = get_env_with_legacy(
    "PITWALL_ADK_OPENAI_MODEL", "PITWALL_LITERT_MODEL", "gemma3n-e2b")
_MODEL_URL = get_env_with_legacy(
    "PITWALL_ADK_OPENAI_URL", "PITWALL_LITERT_URL",
    "http://localhost:8099/v1")
_LITELLM_MODEL = _MODEL_ID if "/" in _MODEL_ID else f"openai/{_MODEL_ID}"
_model = LiteLlm(
    model=_LITELLM_MODEL,                              # litellm provider prefix
    api_base=_MODEL_URL,                               # → LocalLLM at :8099/v1
    api_key=get_env_with_legacy(
        "PITWALL_ADK_OPENAI_API_KEY", "PITWALL_LITERT_API_KEY",
        "lit-serve-not-required"),
)

Environment variables¶

Variable	Default	Purpose
`PITWALL_ADK_OPENAI_URL`	`http://localhost:8099/v1`	LocalLLM endpoint; shared with the warm-path `LitertCoach`. Legacy: `PITWALL_LITERT_URL`
`PITWALL_ADK_OPENAI_MODEL`	`gemma3n-e2b`	Model id (must match what LocalLLM has loaded). Legacy: `PITWALL_LITERT_MODEL`
`PITWALL_ADK_OPENAI_API_KEY`	`lit-serve-not-required`	LocalLLM signed bearer token. Legacy: `PITWALL_LITERT_API_KEY`
`PITWALL_ADK_TIMEOUT_S`	`45`	Per-request timeout
`PITWALL_ADK_CHAR_BUDGET`	`60000`	ADK session rotation char budget
`PITWALL_LITERT_HTTP_TIMEOUT_S`	`30`	Warm-path HTTP client timeout

Retired by ADR-024: PITWALL_ADK_BACKEND, PITWALL_LITERTLM_PATH, PITWALL_LITERTLM_BUDGET. The legacy PITWALL_LITERT_* aliases on URL / MODEL / API_KEY are still read (with a DeprecationWarning on first use) via pitwall._env.get_env_with_legacy.

What's load-bearing identical to ADR-019/021¶

The 18 agents, the orchestrator, the pipelines, all 15 tools.
KV-cache reuse via persistent ADK sessions per driver.
The agent_traces DuckDB schema and PitwallTracingPlugin hook.
The [EMOTION:x] tag contract in every system prompt.
The privacy guarantee — LiteLlm speaks only to the configured api_base, which defaults to 127.0.0.1.
LiteLlm (litellm) normalises tool-call shape from OpenAI to ADK's internal schema; agents see ADK-shaped tool calls regardless of which OpenAI-compatible server is on the other end.

Runner and invocation¶

# adk_agents.py — all internal
_session_service = InMemorySessionService()
_runner = Runner(
    agent=coach_orchestrator,
    app_name="pitwall",
    session_service=_session_service,
    plugins=[PitwallTracingPlugin()],
)

# Public API — src/pitwall/__main__.py calls these
run_adk(prompt, user_id="driver") -> str      # sync, thread-safe via asyncio.run()
reset_driver_session(user_id)                  # expire session (call at /session/start)
get_pending_traces() -> list[dict]             # drain trace buffer for DuckDB write

BaseAgent has no .run() shortcut — Runner.run_async() is the only path.

KV cache and persistent sessions¶

KV reuse happens at the ADK session layer — pitwall keeps the same InMemorySessionService session alive per driver per process. The actual KV-cache reuse then depends on the upstream OpenAI-compatible server:

LocalLLM (production target) — LiteRT-LM 0.11 keeps a per-context KV slot warm across turns; reused sessions skip system-instruction prefill.
Ollama / llama.cpp --server — same per-context KV reuse.
vLLM — exposes prefix caching keyed on the prompt prefix, which ADK's session reuse keeps stable.

In all cases ADK's persistent-session strategy guarantees the prompt prefix the upstream server sees is byte-identical across turns, which is the precondition every implementation keys on.

_driver_sessions: dict[str, str]    # user_id → ADK session_id
_session_turn_count: dict[str, int] # auto-reset at _SESSION_MAX_TURNS = 50

Lifecycle: 1. POST /session/start → reset_driver_session(driver_id) — intentional cold reset, acceptable once per day 2. First run_adk() call → creates ADK session, stores in _driver_sessions 3. Subsequent calls same day → _session_service.get_session() → LiteRT-LM clones KV cache 4. After 50 turns → auto-rotation to prevent context overflow 5. Process restart → all sessions lost (InMemory), next call creates fresh session

Expected impact: System instruction tokens (~100–300 per agent) skip re-prefill on warm calls → ~30–50% prefill reduction → ~0.5–3 s saved per call on Tensor G5 NPU.

Agent telemetry (DuckDB)¶

PitwallTracingPlugin(BasePlugin) hooks into Runner and logs every agent run and tool call to a module-level deque. src/pitwall/__main__.py drains it after every run_adk() call.

CREATE TABLE agent_traces (
    id          INTEGER PRIMARY KEY,
    trace_id    VARCHAR,    -- ADK session UUID — groups one run_adk() call
    pitwall_sid VARCHAR,    -- pitwall session_id (empty for Q&A)
    agent_name  VARCHAR,
    event_type  VARCHAR,    -- 'agent' | 'tool'
    detail      VARCHAR,    -- tool name for tool events
    latency_ms  DOUBLE,     -- wall-clock ms for agent events; NULL for tool events
    success     BOOLEAN,
    ts          TIMESTAMP
);

Useful queries:

-- Which agents are slowest?
SELECT agent_name, ROUND(AVG(latency_ms), 1) as avg_ms, COUNT(*) as runs
FROM agent_traces WHERE event_type = 'agent'
GROUP BY agent_name ORDER BY avg_ms DESC;

-- Most-called tools
SELECT detail, COUNT(*) FROM agent_traces
WHERE event_type = 'tool' GROUP BY detail ORDER BY 2 DESC;

-- Full trace for one run_adk() call
SELECT agent_name, event_type, detail, latency_ms, ts
FROM agent_traces WHERE trace_id = ? ORDER BY ts;

AgentMetaAgent can query this table directly via get_agent_telemetry tool.

Conversation persistence¶

All brief/debrief narratives and Q&A turns persist to the conversations table.

CREATE TABLE conversations (
    id           INTEGER PRIMARY KEY,
    session_id   VARCHAR,
    driver_id    VARCHAR,
    role         VARCHAR,   -- 'coach_brief' | 'coach_debrief' | 'user' | 'assistant'
    text         TEXT,
    focus_items  VARCHAR,   -- JSON array
    emotion      VARCHAR,
    recorded_at  TIMESTAMP DEFAULT now()
);

Q&A turns buffer in _qa_histories (in-memory, TTL = 1 hour) and flush to DuckDB on POST /coach/ask/end.

Read endpoints: - GET /conversations/<session_id> — all turns for a session - GET /conversations/driver/<driver_id> — brief/debrief history across sessions

Bridge integration points¶

Bridge call	What it does
`run_adk(prompt, user_id)`	Runs `PitwallOrchestrator` via `Runner`, returns final text
`_drain_adk_traces(pitwall_sid)`	Flushes `get_pending_traces()` → `agent_traces` DuckDB
`_reset_adk_session(driver_id)`	Expires ADK session (cold KV reset at session start)
`POST /session/start`	Calls `_reset_adk_session(driver_id)` automatically
`POST /coach/ask`	Calls `run_adk(prompt)`, buffers turns in `_qa_histories`
`POST /coach/ask/end`	Flushes `_qa_histories` to `conversations` table
`GET /coach/agents`	Returns `AGENT_REGISTRY` for Vue PWA discovery
`GET /coach/traces?session_id=&limit=&since_ts=`	Recent rows from `agent_traces` DuckDB table. Always HTTP 200; `available: false` when `google-adk` or DuckDB is absent. Response: `{available, traces:[{trace_id, pitwall_sid, agent_name, event_type, detail, latency_ms, success, ts}], count, reason?}`. `limit` defaults to 200 (max 1000); `since_ts` enables incremental polling.

What stays unchanged¶

RuleCoach and CoachArbiter — hot path, untouched
LitertCoach.propose() — still delegates to RuleCoach per ADR-017
All existing Flask endpoints and their JSON contracts
llm_friction table — still receives LLM performance metadata
RuleCoach + canonical phrase library (hot path)

Startup recipes¶

A. Pixel 10 + LocalLLM APK (production)¶

Install LocalLLM as a regular Android APK, pick a Gemma 4 .litertlm from its in-app catalog, copy the bearer token from its Settings screen, and let the bridge in Termux talk to it over 127.0.0.1. google-adk and litellm are base deps of apps/edge-daemon, so a fresh uv sync is all you need.

# On the Pixel — one-time setup:
#   1. Install LocalLLM APK (adb install -r app-debug.apk, or build from
#      github.com/mlnomadpy/localllm)
#   2. Open LocalLLM → Catalog → download a Gemma 4 .litertlm
#      (e.g. gemma-4-E2B-it or gemma-4-E4B-it from litert-community)
#   3. LocalLLM autostarts its HTTP server on :8099 with a signed bearer token
#   4. Copy the bearer token from LocalLLM → Settings

# In a Termux shell (deps already resolved by uv sync):
PITWALL_ADK_OPENAI_URL=http://localhost:8099/v1 \
PITWALL_ADK_OPENAI_MODEL=gemma-4-e2b-it \
PITWALL_ADK_OPENAI_API_KEY="<paste-token-from-LocalLLM-Settings>" \
python3 -m pitwall \
    --litert-model ~/storage/shared/Pitwall/models/gemma-4-E2B-it.litertlm
# Legacy aliases still work: PITWALL_LITERT_URL / PITWALL_LITERT_MODEL /
# PITWALL_LITERT_API_KEY — they emit a DeprecationWarning on first use.

The bridge sends POST /v1/chat/completions to LocalLLM with the bearer token; LocalLLM streams the response via SSE. Two APKs, one phone, one localhost hop, zero cloud. The hot-path E2B engine still loads in-process in the bridge for the < 100 ms warm/hot tier — only the paddock LLM moves to LocalLLM.

B. Dev workstation with Ollama / LM Studio / llama.cpp / vLLM¶

Same transport, different OpenAI-compatible server. Point PITWALL_ADK_OPENAI_URL at whatever you've got running:

# Ollama (macOS)
PITWALL_ADK_OPENAI_URL=http://localhost:11434/v1 \
PITWALL_ADK_OPENAI_MODEL=gemma2:2b \
python3 -m pitwall

# LM Studio:    PITWALL_ADK_OPENAI_URL=http://localhost:1234/v1
# llama.cpp:    PITWALL_ADK_OPENAI_URL=http://localhost:8080/v1
# vLLM:         PITWALL_ADK_OPENAI_URL=http://localhost:8000/v1
# (Legacy PITWALL_LITERT_URL still honoured with a DeprecationWarning.)

The bridge dials only localhost. No hosted LLM is involved at any point.

Retired recipes (per ADR-024): the in-process PITWALL_ADK_BACKEND=engine path and the separate-lit serve PITWALL_ADK_BACKEND=litertlm path were removed post-Sonoma. If you have a deployment pinned to either, the migration is always the same — install LocalLLM and point at it.