RoboRacer Lab 02 — AEB safety demo

Closed-loop physics simulation + LLM tutor for F1Tenth autonomous emergency braking code.

What this is

A live demo of the LLM Coding Tutor applied to F1Tenth autonomous-racing coursework — specifically Lab 02, where students write a safety_node that must brake before colliding with obstacles. The grader runs two test layers — unit tests and a closed-loop physics simulation — and the AI tutor then reads both outputs to explain why anything failed, so students see the same defect surfaced from multiple angles in one report.

The demo deliberately models a real autonomous-vehicle safety property — the brake decision must release when the threat clears — that a naïve grader would silently let students get wrong.

Architecture

The pipeline runs entirely on GitHub infrastructure, with no local install required of the student:

  1. Student push — a commit to safety_node.py in the GitHub Classroom student repo fires the classroom.yml workflow.
  2. Workflow pulls the grader image — a private Docker image with ROS 2 humble, pytest, scipy, and the gemini-python-tutor LLM provider chain baked in.
  3. Two test layers run inside the container, in parallel:
    • Unit tests — per-beam iTTC spec compliance, edge cases (NaN/Inf), false positives.
    • Closed-loop physics simulationscipy.integrate.solve_ivp with event detection for sub-tick-precise collision and stop times. Five scenarios span the safety envelope: 5 m/s, 10 m/s, 15 m/s head-on; stationary; intermittent lidar glitch. Vehicle dynamics use a_max = 9.51 m/s² from the f1tenth_gym defaults.
      • In a future iteration: enable the full f1tenth_gym simulator (already present inside the container) so that the same workflow can also verify higher-level assignment deliverables — track laps, opponent-aware behavior, and realistic-lidar robustness — alongside the lightweight closed-loop check.
  4. AI tutor runs after the tests — it consumes the failing test output plus the student’s code and writes a paragraph or two explaining why each test broke, in plain language scaled to the student’s level. The tutor depends on the test results; it does not duplicate them.
  5. GitHub Step Summary aggregates the verdict and the tutor markdown next to the green/red badge on the commit. Per-scenario trajectory plots are produced as workflow artifacts (Step Summary itself does not reliably inline the rendered images at this size, so they download instead of preview).
Two consecutive runs of the F1TENTH Lab 2 Tests workflow in the GitHub Actions tab: a green-checkmark run from the main branch (1m 50s) and a red-X run from a deliberately buggy branch (1m 53s).
Two consecutive runs of the per-push grading workflow as a student sees them: one green (a correct safety_node), one red (the demo's intentionally-buggy state). Same workflow, same checks, opposite outcomes.

The physics layer takes less than 1 second for five scenarios (sub-tick event detection in scipy.solve_ivp); the AI tutor stage adds another 10-20 s depending on provider and prompt size.

Why the brake decision releases when the threat clears

The F1Tenth ESC tracks the last received /drive target — the same behavior as a real automotive ECU. While the student’s safety_node keeps publishing drive.speed = 0.0 (because iTTC is still below the engagement threshold), the vehicle decelerates. When emission stops, the ESC’s speed target reverts.

The grader models that faithfully. A naïve student whose code emits brake once on a single noisy lidar beam (no filtering, no multi-frame confirmation, no hysteresis) would produce a well-known class of production AV defect: a car that stops in the middle of the road from a single misperception and never recovers — phantom braking. The grader’s reference implementation uses a production-pattern hysteresis controller (engage at iTTC < 1.5 s, release at iTTC > 1.65 s, the standard engage/release split in production ADAS). A student who internalizes “fire once, walk away” finds their car still passes the unit tests but under-decelerates and collides in the physics simulation.

This is the pedagogical bet: a grader that fails the right way teaches the right pattern. A grader that latches the brake permanently after the first emission would silently approve the phantom-braking design.

The bug we’re showing (and why it teaches)

The demo opens on the buggy state — ranges[len(ranges) // 2] — and walks through the failing test message. Then the student fixes to the scalar (min(finite_ranges) / speed) and reruns: now 10/11. One test still fails — the lateral-only test — and the tutor reads the new failure and explains the per-beam requirement. The student fixes to per-beam iTTC and reruns: 11/11.

The story is two voices converging on the same defect:

Scales without compromising access. Grading and feedback run entirely on GitHub’s infrastructure, so a student needs nothing but a browser — no local Python install, no Docker, no GPU, no toolchain to keep working. The same configuration that serves a well-resourced classroom also reaches students who join from older laptops, constrained or intermittent bandwidth, or regions where the power grid is unreliable. A human TA cannot read every commit from every student across every section every week; the platform can — at the same per-commit cost whether the student is in the next room or on the other side of the world.

Roadmap — full f1tenth_gym in CI

The current physics simulation is a 1D longitudinal ODE solver: enough to expose threshold-too-aggressive, missing-hysteresis, and single-tick noise-handling defects, and small enough to add ~100 ms per push to the autograder. The next step is to wire the full f1tenth_gym simulator into the CI pipeline so the same student commit that triggers unit tests also runs against:

The technical blockers are not architectural and not about image size — f1tenth_gym and f1tenth_gym_ros are already installed in the grader image alongside ROS 2 humble, rclpy, and numba. What’s missing is the wiring. The remaining work is mostly about CI runtime budget and the orchestration layer:

Path forward: a tiered grading model, where every push gets the fast 1-D physics layer (current), and a gym-full reusable workflow runs on a manual dispatch or on a slower nightly schedule. Students opt in for the deeper check; the autograder remains snappy for the inner loop. This is the same pattern the platform already uses for the heavier Heex monitoring layer.

Once gym is in CI, the same demo can be re-recorded with a real Levine hallway lap instead of a 1-D wall — and the AV-safety design rationale becomes a video of a vehicle that actually drives.

The grader image, classroom workflow, AI tutor, and physics simulation are all separately versioned. Replacing one piece does not touch the others — the same architectural property that lets the platform serve introductory Python and autonomous-vehicle coursework with the same backbone.