Lesson 09 · Design Before You Build

How to write an AI agent prompt, like a spec

A prompt is a wish; a spec is a testable instruction. How to write an AI agent prompt as requirements with numbers, so it does the same thing every time and you can prove it.

Course
AI Agents for Business
Lesson
09
Outcome
The Spec

Free workbook

The AI Agents Workbook

Every lesson in one fill-in workbook, the scorecards, templates, and checklists.

Plus the occasional email from me, Adham. Only when I've built something genuinely useful, always about making AI actually work for a UAE business. Unsubscribe anytime.

Most people write an AI agent prompt as a wish: "be helpful and professional." Then they wonder why it answers differently every time and nobody can tell if it is right. A prompt is a wish. A spec is a testable instruction. The way to write an AI agent prompt that actually works is to stop writing prose and start writing requirements, with numbers, so the agent does the same thing every time and you can prove it.

Why a clever prompt still fails in production

Two prompts demo well and break quietly. The vibe prompt: "be helpful and professional" is helpful by whose measure? The same enquiry gets a different reply each time. The mega-prompt: three pages of instructions piled up over weeks, where nobody can say which line does what, so nobody can test it, fix it, or trust it. Flight software is never given a wish: every behaviour is a numbered requirement with a way to verify it. That is the only reason you can trust a machine you can never touch again, and it is how an agent earns the same trust.

A prompt versus a spec

A prompt tells the AI what you want. A spec tells you when it is wrong. The test of a real requirement: could two people disagree on whether it was met?If yes, it is still a vibe, add the number. "Reply quickly" cannot be tested; "acknowledge within 120 seconds" can be checked on every single reply. In flight terms this is the Software Requirements Specification, the SRS: it takes each block of your design and writes, in numbered lines, exactly what it must do.

Write one requirement: the five lenses

One requirement is a single "shall" with five things attached, so anyone can build to it and prove it. And five lenses make sure nothing is missed, most AI projects only ever write the first. Functional, what it does. Operational, the conditions it runs in (channels, languages, hours). Performance, how well, with a number, this is the one people skip and the one with the figures. Implementation, how it is built. Validation, how it is proven. Every line also names a parent (the block or job it serves), a rationale (the cost it prevents), and a verification method.

Set the budgets first

Before you write the lines, set the ceilings, the way a satellite allocates a power or pointing budget. A latency budget (say 120 seconds total, split across the blocks), an accuracy budget (price 100%, language 99%), a cost budget (under a set amount per conversation, so it holds at volume), and an escalation budget (hand to a human under a set rate). Then every requirement has to fit inside the budgets, and trade-offs are explicit instead of discovered in production.

A worked example: turning a vibe into a spec

Take the review step of the clinic agent. The vibe was "reply nicely and do not mess up the price." Specced, it becomes testable lines: the agent shall reply in the language received (functional); shall use the listed price in 100% of replies and acknowledge within 120 seconds (performance, with numbers); shall escalate anything clinical or custom-priced to a human (validation). A vibe became lines you can hand to a builder and test on every reply.

IDThe agent shall…LensVerify
SRS-F-01reply in the language the enquiry was received in.FuncTest
SRS-O-02run on WhatsApp and Instagram, Arabic and English, 24/7.OpReview
SRS-P-03acknowledge every enquiry within 120 seconds.PerfTest
SRS-P-04use the listed price in 100% of replies.PerfInspect
SRS-V-05escalate anything clinical or custom-priced to a human.ValTest

Traceability: nothing forgotten, nothing orphaned

The artifact a flight program lives by is the traceability matrix: read top to bottom, it proves every promise from the job is covered by a requirement, and every requirement has a test. Nothing is invented, nothing is missed. Your AI agent spec is the contract the build and the tests both answer to. Once it is written, the next step is picking the tools that meet it, and seeing why the tools matter least.

Want it built for you?

Skip the build. We engineer it to the same standard, live in 14 days.