Why 90 Days Matters
We impose a strict 90-day timeline on every AI pilot we advise. Not because 90 days is magic, but because without a hard deadline, pilots expand indefinitely. They accumulate scope, stakeholders, and complexity until they are too expensive to kill and too fragile to ship.
The 90-day constraint forces three things that matter: ruthless scoping, early production readiness, and honest evaluation. Here is the playbook.
Week 1-2: Define the Kill Criteria
Before writing a single line of code, define the criteria that would cause you to abandon the pilot. This is the step everyone skips and the reason most pilots become zombies.
Kill criteria should be specific and measurable:
- "If the model does not achieve 85% accuracy on our test set of 500 representative cases by week 6, we kill the pilot"
- "If integration with our CRM requires more than 3 engineer-weeks of custom work, we kill the pilot"
- "If users in the pilot group do not adopt the tool for at least 60% of relevant tasks by week 10, we kill the pilot"
Write these down. Get leadership sign-off. Make them non-negotiable. The most important function of kill criteria is preventing the sunk cost fallacy from keeping a failing pilot alive.
Week 2-4: Build on Production Rails
The biggest mistake in AI pilots is building in a sandbox. When you build on isolated infrastructure with clean data, you learn nothing about production viability. You just create a demo that will need to be completely rebuilt.
From day one:
- Use your actual production data pipeline. Yes, it is messier. That is the point.
- Deploy on infrastructure that can scale. If you prove the concept on a laptop, you have not proven it can run in production.
- Implement monitoring and logging from the start. You cannot evaluate what you cannot measure.
- Build the fallback path. What happens when the AI is unavailable? Users need to be able to work without it, and you need to measure the difference.
Week 4-8: Controlled Rollout with Measurement
Do not wait for perfection. Deploy to a small group of real users who understand they are testing something new. Measure everything:
Task completion time: Are users faster with the AI than without it? If not, why not?
Quality metrics: Are the outputs accurate enough for the use case? Where does the model fail, and is the failure pattern acceptable?
User behavior: Are users actually using the AI, or are they working around it? Over-reliance is as dangerous as under-adoption. You want users who verify AI output, not users who blindly trust it.
Cost per task: What does each AI-assisted task cost in terms of compute, API calls, and latency? Is this economically viable at scale?
Week 8-10: Honest Evaluation
Check your results against the kill criteria. This is where most organizations fail, because killing a pilot feels like admitting failure. It is not. It is the best possible use of the money you have spent: you learned that this approach does not work before investing in production.
The evaluation should produce one of three outcomes:
- Ship: The pilot met all criteria. Move to production with a 30-day plan.
- Kill: The pilot failed on fundamental criteria that more time will not fix. Document the learnings and move on.
- Pivot: The pilot revealed a different, more valuable use case than the original hypothesis. Restart the 90-day clock with the new scope.
Week 10-12: Ship or Kill
If the decision is to ship, spend the final two weeks on: production hardening, documentation, on-call setup, and a 30-day monitoring plan for the initial production deployment. If the decision is to kill, spend one day documenting what you learned and move your team to the next opportunity.
The playbook is simple. Following it requires organizational courage: the willingness to kill things early, to build on messy real-world data, and to measure honestly rather than optimistically. That courage is what separates companies that ship AI from companies that pilot AI forever.