ADR-025 — Warm-path LitertCoach is LocalLLM-only¶
Status: Accepted Date: 2026-05-28 Closes: the open question in ADR-024 Relates to: ADR-017, ADR-018, ADR-022, ADR-024
Context¶
ADR-024 consolidated the paddock
ADK tier to a single LocalLLM HTTP transport and explicitly deferred the
warm-path question: LitertCoach.brief() / debrief() still carried two
transports — HTTP-to-LocalLLM (default) and in-process litert_lm.Engine
when PITWALL_ADK_OPENAI_URL="". That dual-transport state inherited
ADR-022's escape hatches and added its own:
LitertCoach.__init__ran a model-path probe (DEFAULT_MODEL_PATHS, five candidates), tried to importlitert_lm, opened an engine context manager, and tracked_engine,_engine_ctx,_init_errorstate._generatebranched onself._http_urlto pick HTTP or in-process.make_coach("auto")probedLitertCoach._llm is not Noneand fell back toRuleCoachon engine load failure.- A separate
litert_lm_model.py(LitertLmModel adapter for ADK) and itstest_litert_lm_model.pylived alongside, even though ADR-024 already retired the ADKenginebackend that called them. - The coaching conftest had to
os.environ["PITWALL_ADK_OPENAI_URL"] = ""at import time so warm-path tests wouldn't accidentally HTTP to a live LocalLLM.
Looking at what production actually did:
- Every shipped warm-path call in field-test logs went through the HTTP
path. The
.litertlmengine probe never produced a loaded engine on the Pixel build becauselitert_lmdoesn't ship on Termux. - The dual-transport branch existed for "desktop dev without LocalLLM
installed." That's a configuration nobody operates — desktop dev runs
Ollama / vLLM, which is the same
openai-compatible HTTP path. - The
make_coach("auto") → RuleCoachfallback existed for "engine fails to load." With HTTP it can't fail at construction; it fails at call time, whichbrief()/debrief()already handle via the no-fake-data policy (return empty + record friction) per ADR-018.
Same shape as ADR-024: a single-path system in a two-path coat.
Decision¶
LitertCoach is HTTP-only. It dials LocalLLM at
PITWALL_ADK_OPENAI_URL (default http://localhost:8099/v1) — the same
endpoint and the same LiteLlm contract the ADK paddock tier uses
(ADR-024). Warm and paddock now
share one transport story.
Concrete changes:
pitwall/features/coaching/litert_coach.pyis rewritten:- Constructor takes only
driver_level,max_tokens,temperature(kwarg-only).model_pathandbackendare gone. - No engine import, no model-path resolution, no
_engine/_engine_ctx/_init_error/_init_runtime/_resolve_model_path/close()/__del__. _generatecalls_generate_httpdirectly; the branch is gone.health()reports only the HTTP transport.-
The no-fake-data policy (ADR-018) is preserved:
brief()/debrief()return empty narratives + emit friction records when LocalLLM is unreachable. -
pitwall/features/coaching/coach_engine.pyloseslitert_model_pathandtflite_model_pathkwargs onmake_coach, loses the"tflite"alias, and loses thetry/exceptengine-load probe in the"auto"branch.make_coach("auto")andmake_coach("litert")now always return aLitertCoach; transport health is observed at call time, not construction. -
pitwall/features/coaching/litert_lm_model.pyis deleted. It was theBaseLlmadapter for ADK's retiredenginebackend (ADR-024 retired the backend; ADR-025 retires the adapter). It was already unused by production code post-ADR-024. -
Dead exports removed:
TfliteCoach(deprecated alias forLitertCoach) and_extract_assistant_text(engine response parser) are deleted fromlitert_coach.pyand dropped fromcoach_engine.py's__all__. -
Tests:
tests/features/coaching/test_litert_lm_model.py— deleted (tests the deleted adapter).tests/features/coaching/test_coach_engine_litert.py— deleted (in-process engine integration tests; the integration target no longer exists).tests/features/coaching/test_coach_engine.py— five tests that monkeypatchedLitertCoach._init_runtimeto simulate engine load/failure are rewritten to monkeypatch_generate_httpand simulate HTTP transport success/failure. Same coverage intent (friction record fires on backend failure; returns empty narrative; sink errors swallowed) over the new transport.-
tests/features/coaching/conftest.py— drops theos.environ["PITWALL_ADK_OPENAI_URL"] = ""opt-out (which only made sense when the engine path existed) and the matching legacy alias mutation. Tests now patch_generate_httpdirectly. -
VALID_EMOTIONSgainsfocused— the ADK system prompt advertises it in_COMMON_PREFIX, so the parser must accept it. (Caught while repairingtest_coach_ask_uses_intent_override; previously the parser silently downgradedfocused→neutral, dropping a valid emotion the LLM was explicitly told to emit.)
Configuration after the change¶
| Variable | Default | Used by |
|---|---|---|
PITWALL_ADK_OPENAI_URL |
http://localhost:8099/v1 |
Warm path and paddock tier (legacy: PITWALL_LITERT_URL) |
PITWALL_ADK_OPENAI_MODEL |
gemma3n-e2b |
Model id (legacy: PITWALL_LITERT_MODEL) |
PITWALL_ADK_OPENAI_API_KEY |
lit-serve-not-required |
Bearer token (legacy: PITWALL_LITERT_API_KEY) |
PITWALL_LITERT_HTTP_TIMEOUT_S |
30 |
Warm-path HTTP timeout (seconds) |
PITWALL_LLM_MAX_TOKENS |
512 |
Warm-path completion budget |
Retired: PITWALL_LITERT_SIDECAR_URL, PITWALL_LITERT_SIDECAR_MODEL,
PITWALL_LITERTLM_PATH, PITWALL_LITERTLM_BUDGET, and any expectation
that setting PITWALL_ADK_OPENAI_URL="" opts into an in-process engine
(it never did anything else — there's no engine to opt into).
Consequences¶
Positive:
- One coaching transport, end to end. The three-tier latency budgets
(Hot < 50 ms, Warm < 100–3 s, Paddock 2–15 s) still hold, but every
LLM call — warm or paddock — goes through the same
LiteLlm/ urllib HTTP shape against the same LocalLLM endpoint. LitertCoach.__init__is cheap and infallible. No model-path probe at boot, no native-lib load, no context manager to leak. Construction is ~10 lines of attribute setup.- The friction record now answers a sharper question — "was the configured LocalLLM endpoint reachable, and what did it return?" — rather than "did the configured backend (which?) load and respond?"
- Both
litert_lm_model.py(300 lines + tests) andtest_coach_engine_litert.py(300 lines) come out of the tree.
Negative:
- Operators who genuinely want an in-process LiteRT-LM warm path (none
in the field-test deployments) would need to re-introduce one. The
upstream
litert_lmpackage and the.litertlmartifact pipeline still exist; bringing back an in-process path is a code change, not an env flip. Acceptable given the field signal. - A fresh dev workstation needs some OpenAI-compatible server on
127.0.0.1for the warm path to produce text. Ollama on:11434, vLLM on:8000, or LocalLLM itself all work — pointPITWALL_ADK_OPENAI_URLat them. Without one, briefs and debriefs return empty narratives and record friction (which is the explicit no-fake-data policy from ADR-018, not a regression).
Neutral:
- The on-device guarantee from ADR-017
is unchanged — every supported transport is still
127.0.0.1. ADR-025 removed an implementation degree of freedom, not a product property. propose()still delegates toRuleCoachper ADR-017. LLM latency was never appropriate for sub-corner cues regardless of transport.
Validation¶
python -c "from pitwall.features.coaching.coach_engine import make_coach; c = make_coach('auto'); print(c.name, c.health())"→litert {'transport': 'http', 'http_url': 'http://localhost:8099/v1', ...}.grep -R 'litert_lm_model\|_engine_ctx\|_init_runtime\|DEFAULT_MODEL_PATHS' apps/edge-daemon/pitwallreturns no matches.apps/edge-daemon/tests/features/coaching/runs green for every coaching test that doesn't have a pre-existing unrelated failure.