AI Harness Engineering Series — Book 1
"Most AI demos work.
Most AI products fail.
The difference is the harness."
AI Harness Engineering Series
Make Your LLM API and
CLI Tools Reliable
The Problem
Most AI books teach you to use the API. This one teaches you to wrap it — with the control infrastructure that separates a demo from a production system.
Sound Familiar?
What's Inside
Chapter 1
All seven techniques in one working example. Get something running before the deep dives.
Chapter 2
Versioning prompts like code. A/B testing variants. Preventing prompt drift in production.
Chapter 3
Schema enforcement on structured outputs. Catching malformed responses before they cause downstream failures.
Chapter 4
Handling rate limits, timeouts, and model degradation gracefully. When to retry, when to fallback, when to fail fast.
Chapter 5
Token metering per request and feature. Setting budgets. Catching runaway spend before it hits the bill.
Chapter 6
Timeouts that match UX expectations. Measuring p95. Strategies when the model is too slow.
Chapter 7
Keeping conversation history within token limits. Summarisation strategies. Sliding windows.
Appendices B–F
Hallucination control, rate limits, cost control, latency, context — each as a symptom → fix reference you can bookmark.
"A prompt without version control is a bug waiting to happen."
— Chapter 2: Prompt Management
"Trusting LLM output without validation is like running user input directly in SQL."
— Chapter 3: Validation
"Average latency is a lie. Your p95 is what pages you at 3am."
— Chapter 6: Latency Budgeting
"The context window is not a dumping ground. What you put in shapes what comes out."
— Chapter 7: Context Management
The Series
Book 1 — Available Now
145 pages · PDF · $25
Book 2 — Available Now
PDF · $25
Book 3 — Coming Soon
In progress
Book 4 — Coming Soon
Planned
Chapter 1 is free — all seven techniques in one working example. No email required. Download it, run the code, see if it's for you.