> ## Documentation Index
> Fetch the complete documentation index at: https://app.keystroke.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Test agents

> Test agents qualitatively and with automated checks.

Agents are non-deterministic, so testing them starts with exercising behavior, not writing a perfect assertion. The fastest loop is often to have your coding agent prompt the Keystroke agent through the CLI, try realistic scenarios, inspect the resulting sessions, and iterate on the system instructions and tools.

Test agents at the boundary you care about:

| Test style           | Use when                                                                                       |
| -------------------- | ---------------------------------------------------------------------------------------------- |
| Qualitative CLI runs | You want to see how the agent behaves across realistic prompts, follow-ups, and tool-use cases |
| Definition tests     | You want to verify tools, model, prompt, or metadata without calling an LLM                    |
| Prompt smoke tests   | You want to verify the agent can run against a real model                                      |
| Tool-use tests       | You need evidence that the agent called a required action or subagent                          |
| End-to-end tests     | You want to run the project server and test API behavior                                       |

Start qualitatively, then turn the stable contracts you discover into tests. Automated tests are best for definition shape, required tool calls, and a small number of critical prompt paths.

Run tests from the project root:

```bash theme={null}
keystroke test --project unit          # src/**/*.test.ts
keystroke test --project integration   # src/**/*.int.test.ts
pnpm test                              # same — calls keystroke test
```

Vitest ships with `@keystrokehq/cli`. No project `vitest.config.ts` is required.

## Qualitative tests

Before writing test files, run the agent the way you expect people to use it. Ask your coding agent to call the Keystroke CLI with a batch of prompts, inspect the sessions, and report where the agent misunderstood instructions, skipped tools, or used tools incorrectly.

```bash theme={null}
keystroke agents prompt support --message "A customer asks whether they can get a refund after 45 days."
keystroke agents prompt support --message "Look up order ORD-123 and decide whether it is refundable."
keystroke agents prompt support --message "Draft a concise Slack reply for that customer."
```

Good qualitative prompts cover:

* Normal requests the agent should handle cleanly.
* Tool-required requests where the agent must call an action, workflow, MCP tool, or subagent.
* Missing-information cases where the agent should ask a clarifying question instead of guessing.
* Follow-up messages in the same session.
* Edge cases that should be refused, escalated, or handled cautiously.

Then inspect the session:

```bash theme={null}
keystroke agents sessions get support <session-id> --include messages,events,trace
```

Use this loop to tune the system prompt, tools, skills, files, and model choice. Once the behavior feels right, write focused tests for the parts that should not regress.

## Definition tests

Definition tests are fast and do not need provider keys.

```ts theme={null}
import { describe, expect, it } from "vitest";
import support from "./support";

describe("support agent", () => {
  it("uses the expected model and support files", () => {
    expect(support.slug).toBe("support");
    expect(support.model).toBe("anthropic/claude-sonnet-4.6");
    expect(support.systemPrompt).toContain("/workspace");
  });
});
```

Use this style to catch accidental model changes, missing tools, or prompt edits that remove required instructions.

## Smoke-tests

The init template includes an agent integration test shaped like this:

```ts theme={null}
import { describe, expect, it } from "vitest";
import hello from "./hello";

describe("hello agent", () => {
  it.skipIf(!process.env.ANTHROPIC_API_KEY)("responds to a prompt", async () => {
    const result = await hello.prompt({ message: "Say hi in one word." });

    expect(result.messages.some((message) => message.role === "assistant")).toBe(true);
  });
});
```

The provider-key guard keeps local and CI runs from failing when real model credentials are not available.

Run integration tests from the project root:

```bash theme={null}
keystroke test --project integration
# or
pnpm test -- --project integration
```

Integration tests load `.env` when present and skip when required keys (like `ANTHROPIC_API_KEY`) are unset. Vitest and its config ship with `@keystrokehq/cli` — no project `vitest.config.ts`.

## Tool use tests

When a tool call is the contract, assert on the recorded messages rather than only the final answer.

```ts theme={null}
import { describe, expect, it } from "vitest";
import orchestrator from "./orchestrator";

function usedTool(messages: Array<{ role: string; toolName?: string }>, toolName: string) {
  return messages.some((message) => message.role === "toolResult" && message.toolName === toolName);
}

describe("orchestrator agent", () => {
  it.skipIf(!process.env.ANTHROPIC_API_KEY)("delegates research", async () => {
    const result = await orchestrator.prompt({
      message: "Research whether the sky is blue. You must use ask_researcher.",
    });

    expect(result.error).toBeNull();
    expect(usedTool(result.messages, "ask_researcher")).toBe(true);
  });
});
```

Keep the prompt narrow. Tests that ask for broad natural-language behavior are more likely to be flaky than tests that assert a specific tool contract.

## Sessions and memory

Prompt tests create sessions. If a test should be repeatable, use a fresh session or disable memory on the agent under test:

```ts theme={null}
export default defineAgent({
  slug: "classifier",
  systemPrompt: "Classify the message and return only the label.",
  model: "anthropic/claude-sonnet-4.6",
  memory: false,
});
```

If you need a multi-turn test, keep the returned `sessionId` and pass it to the next prompt:

```ts theme={null}
const first = await support.prompt({ message: "Remember the word orchid." });
const second = await support.prompt({
  sessionId: first.sessionId,
  message: "What word did I ask you to remember?",
});
```

## Failure inspection

When a prompt test fails, inspect the session before changing code:

```bash theme={null}
keystroke agents sessions list support --status failed
keystroke agents sessions get support <session-id> --include messages,events,trace
```

For deployed agents, use **History** in the web app and filter to agent runs. The detail panel shows messages, tool calls, metadata, and trace data.

## Next steps

<CardGroup cols={2}>
  <Card title="Run agents" href="/learn/agents/run-agents">
    Prompt agents locally and inspect sessions from the CLI.
  </Card>

  <Card title="Agent runs" href="/learn/logs/agent-runs">
    Debug failed sessions in the web app.
  </Card>

  <Card title="Actions as tools" href="/learn/actions/agent-tools">
    Build deterministic tool contracts that are easier to test.
  </Card>

  <Card title="Deploy a project" href="/learn/projects/deploy-a-project">
    Run tests before deploying changed agents.
  </Card>
</CardGroup>
