Lesson 09 · Design Before You Build
How to write an AI agent prompt, like a spec
A prompt is a wish; a spec is a testable instruction. How to write an AI agent prompt as requirements with numbers, so it does the same thing every time and you can prove it.
Free workbook
The AI Agents Workbook
Every lesson in one fill-in workbook, the scorecards, templates, and checklists.
Plus the occasional email from me, Adham. Only when I've built something genuinely useful, always about making AI actually work for a UAE business. Unsubscribe anytime.
Most people write an AI agent prompt as a wish: "be helpful and professional." Then they wonder why it answers differently every time and nobody can tell if it is right. A prompt is a wish. A spec is a testable instruction. The way to write an AI agent prompt that actually works is to stop writing prose and start writing requirements, with numbers, so the agent does the same thing every time and you can prove it.
Why a clever prompt still fails in production
Two prompts demo well and break quietly. The vibe prompt: "be helpful and professional" is helpful by whose measure? The same enquiry gets a different reply each time. The mega-prompt: three pages of instructions piled up over weeks, where nobody can say which line does what, so nobody can test it, fix it, or trust it. Flight software is never given a wish: every behaviour is a numbered requirement with a way to verify it. That is the only reason you can trust a machine you can never touch again, and it is how an agent earns the same trust.
A prompt versus a spec
A prompt tells the AI what you want. A spec tells you when it is wrong. The test of a real requirement: could two people disagree on whether it was met?If yes, it is still a vibe, add the number. "Reply quickly" cannot be tested; "acknowledge within 120 seconds" can be checked on every single reply. In flight terms this is the Software Requirements Specification, the SRS: it takes each block of your design and writes, in numbered lines, exactly what it must do.
Write one requirement: the five lenses
One requirement is a single "shall" with five things attached, so anyone can build to it and prove it. And five lenses make sure nothing is missed, most AI projects only ever write the first. Functional, what it does. Operational, the conditions it runs in (channels, languages, hours). Performance, how well, with a number, this is the one people skip and the one with the figures. Implementation, how it is built. Validation, how it is proven. Every line also names a parent (the block or job it serves), a rationale (the cost it prevents), and a verification method.
Set the budgets first
Before you write the lines, set the ceilings, the way a satellite allocates a power or pointing budget. A latency budget (say 120 seconds total, split across the blocks), an accuracy budget (price 100%, language 99%), a cost budget (under a set amount per conversation, so it holds at volume), and an escalation budget (hand to a human under a set rate). Then every requirement has to fit inside the budgets, and trade-offs are explicit instead of discovered in production.
A worked example: turning a vibe into a spec
Take the review step of the clinic agent. The vibe was "reply nicely and do not mess up the price." Specced, it becomes testable lines: the agent shall reply in the language received (functional); shall use the listed price in 100% of replies and acknowledge within 120 seconds (performance, with numbers); shall escalate anything clinical or custom-priced to a human (validation). A vibe became lines you can hand to a builder and test on every reply.
| ID | The agent shall… | Lens | Verify |
|---|---|---|---|
| SRS-F-01 | reply in the language the enquiry was received in. | Func | Test |
| SRS-O-02 | run on WhatsApp and Instagram, Arabic and English, 24/7. | Op | Review |
| SRS-P-03 | acknowledge every enquiry within 120 seconds. | Perf | Test |
| SRS-P-04 | use the listed price in 100% of replies. | Perf | Inspect |
| SRS-V-05 | escalate anything clinical or custom-priced to a human. | Val | Test |
Traceability: nothing forgotten, nothing orphaned
The artifact a flight program lives by is the traceability matrix: read top to bottom, it proves every promise from the job is covered by a requirement, and every requirement has a test. Nothing is invented, nothing is missed. Your AI agent spec is the contract the build and the tests both answer to. Once it is written, the next step is picking the tools that meet it, and seeing why the tools matter least.
Want it built for you?