Every scaffold ships with an eval suite, a mock connector pack, and a runnable eval plan — then an execution phase that actually runs it offline and reports pass, fail, or blocked per case.
1
A check that is declared but never exercised by a case shows as untested — a real gap, not just a missing checklist item.
2
A missing or unreadable minimum threshold conservatively assumes a 100% pass requirement rather than skipping the check.
3
Mock execution proves the scaffold's wiring is sound. It does not grade genuine agent reasoning against a live system.