AI for Concrete Dispatch Software

AI for Concrete Dispatch Software

Concrete is the hardest operational workflow to get AI right on. Orders must place against real plants, real trucks, real drivers, real pour windows. A fabricated ticket ID or a wrong quantity isn't a "hallucination" — it's a logistics failure that cascades through a whole shift. We've taken a production AI copilot for a US concrete-dispatch SaaS from 75K tokens per happy path to under 26K, and eliminated the fabricated-ID failure class entirely. Same model.

The LLM is reading full JSON order payloads every turn

Past-orders and order-detail tools return raw JSON of dozens of fields. The model re-reads, re-summarises, and burns tokens on every subsequent turn. A thin server-side normaliser cuts that cost by ~75% per tool call.

Buttons and map-pickers are emitted by the LLM

UI element choice — button labels, option arrays, when to render the map — travels through the model as tool instructions and tool outputs. Tokens pay for UI twice: to instruct the model and to retry when it skips a required render. Moving UI emission into deterministic code saves tokens and makes fabricated IDs structurally impossible.

Reorder flow re-asks data the system already has

Load size, truck spacing, quantity, address — all in the past-orders table. A well-designed reorder prefills all of them server-side in one step; a naive one turns them into four follow-up questions and four extra LLM turns.

System prompt carries rules the code could enforce

Typical dispatch prompts carry 40–60% of their content for rules the surrounding application could enforce deterministically. Every turn pays for those rules in tokens. Stripping them is the fastest structural win.

What we do

Per-call token and cost observability so every regression is a query, not an investigation

Server-side state machine for the order / reorder / reschedule flow — LLM stops driving, code does

Tool-output normalisation layer — compact text replaces raw JSON, 3–5× fewer tokens per tool call

UI emission pulled out of the model — fabricated IDs and dropped buttons become structurally impossible

Eval harness that watches the class of errors that matter (fabricated IDs, quantity drift, wrong plant) across every release

Continue