ML Models for Telemetry¶

Models built and tested on 183 VBO sessions (535K frames, 14.9 hours, 8 tracks). Held-out evaluation on an entirely unseen track.

Data Profile: Full Dataset (52 Hot Lap Sessions)¶

From 456,711 hot lap frames across 3 primary tracks (Sonoma, Track 2, Track 8):

Phase	Frames	% of Driving	Key Insight
Cornering (powered)	199,584	43.7%	Largest phase. Corner speed + exit speed are the primary coaching targets.
Straight	66,297	14.5%	Full throttle — no coaching needed.
Transition	65,694	14.4%	Between phases — smoothness matters.
Cornering (coast)	45,906	10.1%	In a corner but not on throttle — coaching opportunity.
Braking	40,399	8.8%	Brake point and pressure. Peak: 107 bar. P95: 28.4 bar.
Coasting (wasted)	28,734	6.3%	#1 coaching target. ~6s per lap doing nothing.
Trail braking	10,097	2.2%	Rare, highest-skill. Only 2.2% but enormous value per second.

Training split (held-out by track, not random):

Split	Data	Frames	Purpose
Train	Sonoma (80%) + Track 2	292,944	Learn from 2 tracks
Val	Sonoma (20% held-out sessions)	50,737	Same track, different sessions
Test	Track 8 (entirely unseen)	92,972	Cross-track generalization

Model 1: Driving Phase Classifier¶

What: Classify each frame into a driving phase: braking, trail-braking, cornering, accelerating, straight, coasting.

Why: Phase detection is the foundation for all other models. The sonic model needs to know "we're in trail braking right now" to play the right tone. The coaching engine needs to know "driver is coasting" to fire the right pedagogical vector.

Architecture:

graph LR
    FRAME["Frame features:<br/>speed, gLat, gLong,<br/>brake, throttle,<br/>steering, combo_g"] --> MODEL[Gradient Boosted Tree<br/>XGBoost, 6 classes]
    MODEL --> PHASE["Phase:<br/>braking | trail_brake |<br/>cornering | accel |<br/>straight | coasting"]

Features (7):

Feature	Why
speed (m/s)	Differentiates high-speed vs low-speed phases
g_lat (G)	Cornering intensity
g_long (G)	Braking/acceleration intensity
brake_pressure (bar)	Separates braking from cornering
throttle (%)	Separates acceleration from coasting
steering (degrees)	Separates turning from straight
combo_g (G)	Overall grip usage

Labels: Derived from thresholds on the Gold Standard lap:

def label_phase(frame):
    if frame.brake > 5 and abs(frame.g_lat) > 0.4:
        return "trail_brake"
    if frame.brake > 5:
        return "braking"
    if abs(frame.g_lat) > 0.4 and frame.throttle > 20:
        return "cornering"      # powered cornering
    if abs(frame.g_lat) > 0.4:
        return "cornering"      # coasting through corner
    if frame.throttle > 50 and abs(frame.g_lat) < 0.3:
        return "straight"
    if frame.throttle < 10 and frame.brake < 2:
        return "coasting"
    return "accelerating"

Model: XGBoost classifier. 100 trees, max depth 5. Size: ~100KB. Inference: <1ms.

Training data: Label every frame in the Gold Standard lap. Train on 70%, validate on 30%. Expected accuracy: >95% (phases are well-separated in feature space).

Use: The phase label feeds into the sonic model as a primary input. Instead of hand-coded if/else for each tone layer, the sonic model says "we're in trail_brake phase → play trail brake tone."

Model 2: Brake Point Predictor¶

What: Predict the optimal brake point (distance from corner entry) for each corner, given approach speed.

Why: The sonic model needs to know when to fire the brake approach tone. Currently it uses fixed geofence distances per corner. A trained model adapts to the actual speed.

Architecture:

graph LR
    INPUT["Approach speed (m/s)<br/>Corner severity (1-6)<br/>Corner direction (L/R)<br/>Track gradient (degrees)"] --> MODEL[Linear Regression<br/>or small MLP]
    MODEL --> OUTPUT["Optimal brake point<br/>(meters before apex)"]

Training data: From the Gold Standard lap, extract:

for each corner:
    approach_speed = speed at 200m before corner entry
    brake_start = distance where brake_pressure first exceeds 5 bar
    brake_point = corner_entry_distance - brake_start

    → (approach_speed, severity, direction, gradient) → brake_point

One session gives 12 corner samples (12 corners × 1 lap). Ten sessions = 120 samples. AJ's lap gives the "correct" brake points.

Model: Linear regression for v1 (brake_point ≈ a * speed + b * severity + c). Upgrade to small MLP (2 layers, 16 neurons) if non-linear effects matter (they will — aero downforce makes high-speed braking points shorter than linear prediction).

Use: The sonic model fires the brake approach tone at predicted_brake_point + margin meters before the corner. The margin decreases as the model's confidence increases (more data = tighter timing).

Model 3: Lap Time Predictor¶

What: Predict the final lap time from partial telemetry (after each sector).

Why: Audio chime at sector boundaries tells the driver if they're ahead or behind pace.

Architecture:

graph LR
    INPUT["Completed sector times<br/>Speed at sector boundary<br/>Brake events count<br/>Avg combo_g in sector"] --> MODEL[Multivariate Linear<br/>Regression]
    MODEL --> OUTPUT["Predicted lap time<br/>(seconds)"]

Training data: Split each lap into 3 sectors. After sector 1, predict the full lap from sector 1 time + sector 1 telemetry stats. After sector 2, prediction improves with more data.

features_after_sector_1 = [
    sector_1_time,
    sector_1_avg_speed,
    sector_1_max_glat,
    sector_1_brake_events,
    sector_1_coast_time,
]
label = full_lap_time

Model: Linear regression. With 8+ laps per session, even one session gives enough data for a per-track model. Prediction improves through the lap (sector 1 alone: ±3s accuracy, sector 2: ±1s, sector 3: ±0.3s).

Use: At each sector boundary, play ascending chimes (ahead of prediction) or descending (behind). The driver instantly knows their pace without looking at a screen.

Model 4: Corner Performance Scorer¶

What: Score each corner pass 0-100 against the Gold Standard.

Why: The post-session report card needs a single number per corner. The sonic model can use it for end-of-corner chimes (high score = ascending chime, low = descending).

Architecture:

graph LR
    INPUT["Corner telemetry:<br/>entry_speed, min_speed,<br/>exit_speed, trail_brake_pct,<br/>max_glat, corner_time,<br/>steering_smoothness"] --> MODEL[Multi-Output Regression<br/>or Weighted Formula]
    MODEL --> OUTPUT["Corner score: 0-100<br/>Sub-scores per dimension"]

Scoring dimensions:

Dimension	Weight	100 =	0 =
Entry speed	15%	≥ AJ's entry speed	< 80% of AJ's
Min speed	20%	≥ AJ's min speed	< 70% of AJ's
Exit speed	25%	≥ AJ's exit speed	< 75% of AJ's
Corner time	20%	≤ AJ's corner time	> 130% of AJ's
Trail brake quality	10%	Smooth release, brake at apex	No trail brake or abrupt release
Smoothness	10%	Low steering variation	High steering corrections

Model: Weighted formula for v1 (no ML needed — just normalized comparison to Gold Standard). Upgrade to a trained regressor if you want the model to learn which dimensions matter most for lap time.

Use: Corner score drives the end-of-corner chime in the sonic model. Score > 80 = ascending chime. Score < 50 = descending. Between = neutral. Over time, the driver hears the chime pattern and internalizes which corners need work.

Model 5: Driving Style Fingerprint¶

What: Cluster telemetry patterns into a driver style profile that evolves over sessions.

Why: Coaching should adapt to the driver. An aggressive late-braker needs different advice than a smooth trail-braker.

Architecture:

graph LR
    subgraph Per-Corner Feature Extraction
        FRAMES[Corner telemetry<br/>all corners, all laps] --> FEATURES["Feature vector:<br/>peak_brake_g, brake_point,<br/>throttle_ramp_rate,<br/>friction_util, trail_overlap,<br/>line_consistency, coast_time"]
    end

    FEATURES --> CLUSTER[K-Means or HDBSCAN<br/>on feature vectors]
    CLUSTER --> PROFILE["Driver archetype:<br/>Aggressive Late-Braker<br/>Smooth Trail-Braker<br/>Cautious Early-Braker<br/>Inconsistent"]

    PROFILE --> ADAPTATION[Coaching adaptation:<br/>Aggressive → more smoothness cues<br/>Cautious → more commitment cues<br/>Inconsistent → more consistency drills]

Feature vector per corner per lap (7 features):

features = [
    peak_brake_g,           # MAX(|g_long|) during braking phase
    brake_point_distance,   # how early they brake vs AJ
    throttle_ramp_rate,     # d(throttle)/dt from apex to exit
    friction_circle_util,   # AVG(combo_g / max_g) through corner
    trail_brake_overlap,    # frames where brake > 5 AND |gLat| > 0.4
    steering_stddev,        # steering variation through corner (smoothness)
    coast_duration,         # time with no throttle and no brake
]

Model: K-Means (k=4 archetypes) for initial clustering. One session gives 12 corners × N laps = ~96 feature vectors. Enough for clustering after 2-3 sessions.

Use: The driver's archetype adjusts the sonic model's behavior: - Aggressive → grip tone is more prominent (they need limit awareness) - Cautious → brake approach tone extends further (they need to commit later) - Inconsistent → lap estimate chimes are more frequent (they need consistency feedback)

Model 6: Optimal Tone Timing (Learned Sonic Model)¶

What: Learn the mapping from telemetry → audio parameters from driver feedback data.

Why: The hand-tuned sonic model (v1) uses fixed thresholds. A trained model can learn when the driver actually responded well to a tone vs when they ignored it, and adjust timing accordingly.

Architecture:

graph LR
    subgraph Training Data Collection
        CUE[Sonic cue fired at time T] --> RESPONSE["Driver response within 2s?<br/>brake applied / throttle change /<br/>steering correction"]
        RESPONSE --> LABEL["Positive: response within 1s<br/>Negative: no response or > 2s<br/>Dangerous: cue during distraction"]
    end

    subgraph Model
        FRAME["Frame features +<br/>cue parameters"] --> NN[Small MLP<br/>3 layers × 32 neurons]
        NN --> TIMING["Optimal cue timing:<br/>fire now / wait / don't fire"]
    end

Training data: The simulator exports frame + cue pairs (the CSV we just generated: 8,273 rows). After real sessions, we also capture the driver's response — did the brake tone lead to braking? Did the throttle pulse lead to throttle application?

for each cue that fired:
    response_time = time until driver action matching the cue
    if response_time < 1.0:
        label = "effective"     # driver responded quickly
    elif response_time < 2.0:
        label = "late"          # driver responded slowly — fire earlier next time
    elif response_time > 5.0:
        label = "ignored"       # driver didn't respond — wrong cue or wrong time

Model: Small MLP (3 layers, 32 neurons). Input: telemetry features + proposed cue parameters. Output: probability the driver will respond effectively. Size: ~20KB. Inference: <1ms.

Use: Before firing a cue, the sonic model queries this model: "If I fire a brake approach tone right now, what's the probability the driver responds?" If probability < 0.3, delay the cue. This makes the sonic model adaptive to each driver's reaction time and style.

Model 7: Anomaly Detection for Car Health¶

What: Detect abnormal telemetry patterns that indicate mechanical problems.

Why: Catch brake fade, overheating, pressure drops before they become dangerous.

Architecture:

graph LR
    BASELINE["Healthy session baseline<br/>(first 5 sessions)"] --> ISO[Isolation Forest<br/>multivariate anomaly detection]
    LIVE["Live frame features:<br/>brake_pressure vs gLong ratio,<br/>coolant_temp trend,<br/>oil_pressure vs rpm"] --> ISO
    ISO --> SCORE["Anomaly score<br/>normal / warning / critical"]

Monitored relationships:

Relationship	Normal	Anomaly
brake_pressure vs gLong	Linear — more pressure = more G	Ratio decreasing = brake fade
coolant_temp over session	Rises then stabilizes at 85-95°C	Keeps rising past 100°C = cooling issue
oil_pressure vs rpm	Higher RPM = higher pressure	Pressure drops at high RPM = pump issue
combo_g max per corner	Consistent lap-to-lap	Decreasing = tire degradation or driver fatigue

Model: Isolation Forest (scikit-learn). Train on the first 3-5 healthy sessions. Flag frames with anomaly score > threshold.

Use: If anomaly detected, the sonic model plays a distinct warning pattern (different from grip/brake/throttle tones). On the post-session dashboard, anomalous frames are highlighted for the race engineer.

Model Summary and Training Pipeline¶

graph TB
    VBO[VBO Files<br/>10Hz telemetry] --> PARSE[VBO Parser]
    PARSE --> DUCK[(DuckDB<br/>session store)]

    DUCK --> M1[Model 1: Phase Classifier<br/>XGBoost, ~100KB, <1ms]
    DUCK --> M2[Model 2: Brake Point Predictor<br/>Linear Reg, ~5KB, <1ms]
    DUCK --> M3[Model 3: Lap Time Predictor<br/>Linear Reg, ~5KB, <1ms]
    DUCK --> M4[Model 4: Corner Scorer<br/>Weighted formula, 0KB, <1ms]
    DUCK --> M5[Model 5: Style Fingerprint<br/>K-Means, ~50KB, <1ms]
    DUCK --> M6[Model 6: Tone Timing<br/>MLP 3x32, ~20KB, <1ms]
    DUCK --> M7[Model 7: Anomaly Detection<br/>Isolation Forest, ~200KB, <5ms]

    M1 --> SONIC[Sonic Co-Driver<br/>real-time tone generation]
    M2 --> SONIC
    M3 --> SONIC
    M4 --> SONIC
    M5 --> SONIC
    M6 --> SONIC
    M7 --> SONIC

Total model size: ~380KB. All models run in <5ms combined. Fits on any device including Pixel 10.

Training data required:

Model	Min Sessions	Min Frames	When Usable
Phase Classifier	1 (Gold Standard)	8K	Day 1 (pre-trained on AJ's lap)
Brake Point Predictor	1	12 corners	Day 1
Lap Time Predictor	3+ laps	36 sectors	Lap 4 of first session
Corner Scorer	1 (Gold Standard)	12 corners	Day 1
Style Fingerprint	2-3 sessions	200+ corners	Session 3
Tone Timing	5+ sessions with response data	5000+ cue-response pairs	Session 5+
Anomaly Detection	3-5 sessions	25K+	Session 5+

Models 1-4 work from the first session. Models 5-7 improve over time. The system is useful on day 1 and gets better every session.