A fleet copilot that works in the demo but drifts in production costs more than a bad forecast — it costs driver trust. Once a dispatcher clicks a fabricated shipment ID three times in a week, the copilot is dead regardless of its accuracy on the easy paths. We work with logistics and telematics platforms whose AI has shipped and is showing its seams — token bills outrunning gross margin, fabricated IDs reaching production UIs, latency tails that scare operators.
Route optimisation is a solved deterministic problem. Asking the model to "pick the best order of stops" wastes tokens and drifts under load. The model should handle ambiguity (driver requests, natural-language constraints); a typed route solver handles structure.
The model emits shipment IDs, carrier IDs, or driver names in UI elements. When the ID doesn't match, the button is broken. Same pattern as concrete dispatch: source IDs from tool JSON, never from LLM memory.
Fleet-ops products increasingly call multiple providers (Anthropic for reasoning, OpenAI for tool use, Gemini for vision). A tiered routing table maps request type to the cheapest model that meets quality. Built once, re-benchmarkable quarterly; typical saving is 30–50% of LLM spend with no quality drop.
Live truck position, current shift status, recent jobs — all in the telematics API. A prefill layer replaces three-to-five Q&A turns per interaction.
AI cost audit against the 7-layer framework — orchestration, prompt discipline, caching, model selection, UX, infra, governance
Hybrid architecture separating what the LLM owns (ambiguity, language) from what solvers own (routing, ETA, constraint satisfaction)
Production controls: observability, cost governance, kill-switches, rollback — the capabilities the demo didn't need
Integration into legacy dispatcher UIs through typed UI contracts, not bolt-on widgets
Fabrication-rate eval harness as part of the release gate