| Test style | Use when |
|---|---|
| Qualitative CLI runs | You want to see how the agent behaves across realistic prompts, follow-ups, and tool-use cases |
| Definition tests | You want to verify tools, model, prompt, or metadata without calling an LLM |
| Prompt smoke tests | You want to verify the agent can run against a real model |
| Tool-use tests | You need evidence that the agent called a required action or subagent |
| End-to-end tests | You want to run the project server and test API behavior |
@keystrokehq/cli. No project vitest.config.ts is required.
Qualitative tests
Before writing test files, run the agent the way you expect people to use it. Ask your coding agent to call the Keystroke CLI with a batch of prompts, inspect the sessions, and report where the agent misunderstood instructions, skipped tools, or used tools incorrectly.- Normal requests the agent should handle cleanly.
- Tool-required requests where the agent must call an action, workflow, MCP tool, or subagent.
- Missing-information cases where the agent should ask a clarifying question instead of guessing.
- Follow-up messages in the same session.
- Edge cases that should be refused, escalated, or handled cautiously.
Definition tests
Definition tests are fast and do not need provider keys.Smoke-tests
The init template includes an agent integration test shaped like this:.env when present and skip when required keys (like ANTHROPIC_API_KEY) are unset. Vitest and its config ship with @keystrokehq/cli — no project vitest.config.ts.
Tool use tests
When a tool call is the contract, assert on the recorded messages rather than only the final answer.Sessions and memory
Prompt tests create sessions. If a test should be repeatable, use a fresh session or disable memory on the agent under test:sessionId and pass it to the next prompt:
Failure inspection
When a prompt test fails, inspect the session before changing code:Next steps
Run agents
Prompt agents locally and inspect sessions from the CLI.
Agent runs
Debug failed sessions in the web app.
Actions as tools
Build deterministic tool contracts that are easier to test.
Deploy a project
Run tests before deploying changed agents.