> ## Documentation Index
> Fetch the complete documentation index at: https://app.keystroke.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Test workflows

> Run workflows in tests and assert their output, steps, and failures.

Workflows are deterministic orchestration, so unlike agents you can usually write real assertions on their output. Run the workflow with `executeWorkflow()` and check the result. Where a workflow includes an [agent](/learn/workflows/build-workflows#agent-steps) or [LLM](/learn/workflows/build-workflows#llm-steps) step, isolate that non-determinism so the rest of the workflow stays easy to assert.

Test workflows at the boundary you care about:

| Test style             | Use when                                                           |
| ---------------------- | ------------------------------------------------------------------ |
| Run tests              | You want to execute the workflow and assert on its output          |
| Definition tests       | You want to verify the slug or input/output schema without running |
| Input validation tests | You want to confirm bad input is rejected                          |
| Integration tests      | You want real action, agent, or LLM steps to run end to end        |

Run tests from the project root:

```bash theme={null}
keystroke test --project unit          # src/**/*.test.ts
keystroke test --project integration   # src/**/*.int.test.ts
pnpm test                              # same — calls keystroke test
```

Vitest ships with `@keystrokehq/cli`. No project `vitest.config.ts` is required.

## Run a workflow in a test

`executeWorkflow()` runs one durable pass and resolves to a result you can assert on. The result is a discriminated union: `completed` with `output`, `failed` with `error`, or `suspended` when the run hit a [durable wait](/learn/workflows/build-workflows#durable-waits).

```ts theme={null}
import { executeWorkflow } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import greeting from "./greeting";

describe("greeting workflow", () => {
  it("returns a greeting for the input name", async () => {
    const result = await executeWorkflow(greeting, { name: "Ada" });

    expect(result).toEqual({
      status: "completed",
      output: { message: "Hello, Ada" },
    });
  });
});
```

For a workflow whose steps are pure or call deterministic actions, asserting the full `output` like this is the most useful test.

## Test durable waits

A workflow that calls `ctx.sleep()` or `ctx.hook()` suspends instead of completing in one pass. `executeWorkflow()` returns `{ status: "suspended", items }`, where each item's `kind` is `"sleep"` or `"hook"`. Assert that the run suspended where you expect:

```ts theme={null}
const result = await executeWorkflow(approvalFlow, { id: "req_1" });

expect(result.status).toBe("suspended");
if (result.status === "suspended") {
  expect(result.items[0].kind).toBe("hook");
}
```

## Stub steps by seeding the event log

To test what a workflow does *after* an expensive step (an agent, LLM, or HTTP action) without running it, seed the durable event log. Each step is checkpointed as a `step_completed` event keyed by a correlation id: `step:<key>#<occurrence>` (`#0` for the first call at that spot, `#1` for the second). The key is assigned automatically from the call's position; for unbuilt local runs it falls back to the action/agent slug (`step:research-signup#0`). Pre-seeding one makes the runner reuse that result instead of executing the step.

```ts theme={null}
import { executeWorkflow, MemoryEventLog } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import signupPipeline from "./signup-pipeline";

describe("signup-pipeline", () => {
  it("drafts without calling the agent", async () => {
    const eventLog = new MemoryEventLog();
    const runId = "test-run";

    await eventLog.append({
      runId,
      type: "step_completed",
      correlationId: "step:research-signup#0",
      data: { brief: "stubbed brief" },
    });

    const result = await executeWorkflow(
      signupPipeline,
      { name: "Ada", email: "ada@example.com" },
      { runId, eventLog },
    );

    expect(result.status).toBe("completed");
  });
});
```

Pass a fixed `runId` and the same `MemoryEventLog`. The stored `data` is the action's output and is re-validated against its output schema, so it must be schema-valid. This still exercises the real `run` orchestration, only the stubbed step body is skipped.

## Definition tests

Definition tests are fast and never run the workflow. Use them to catch accidental slug changes or schema edits that would break callers and triggers.

```ts theme={null}
import { describe, expect, it } from "vitest";
import signupPipeline from "./signup-pipeline";

describe("signup-pipeline workflow", () => {
  it("keeps its slug and input contract", () => {
    expect(signupPipeline.slug).toBe("signup-pipeline");
    expect(() => signupPipeline.input.parse({ name: "Ada", email: "ada@example.com" })).not.toThrow();
  });
});
```

Because `input` and `output` are Zod schemas, you can parse sample payloads against them directly.

## Input validation tests

A workflow rejects input that does not match its `input` schema before `run` executes. Assert that bad input is refused.

```ts theme={null}
import { executeWorkflow } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import signupPipeline from "./signup-pipeline";

describe("signup-pipeline validation", () => {
  it("rejects a malformed email", async () => {
    await expect(
      executeWorkflow(signupPipeline, { name: "Ada", email: "not-an-email" }),
    ).rejects.toThrow();
  });
});
```

## Testing workflows with agent or LLM steps

A workflow that prompts an agent or calls `promptLlm()` is no longer fully deterministic, so assert on the parts that are stable rather than exact model text.

* **Test deterministic actions separately.** Move logic-heavy steps into [actions](/learn/actions/overview) and unit-test those directly, so the workflow test only has to check orchestration.
* **Assert on shape, not wording.** For a step that returns model text, assert the output is a non-empty string or matches a structured `outputSchema`, not an exact sentence.
* **Guard real model runs.** Tests that call a real model should skip when no provider key is available, so local and CI runs do not fail without credentials.

```ts theme={null}
import { executeWorkflow } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import jokeFlow from "./joke-flow";

describe("joke-flow workflow", () => {
  it.skipIf(!process.env.ANTHROPIC_API_KEY)("produces a joke and a category", async () => {
    const result = await executeWorkflow(jokeFlow, {});

    expect(result.status).toBe("completed");
    if (result.status === "completed") {
      expect(result.output.joke.length).toBeGreaterThan(0);
      expect(result.output.category).toBeDefined();
    }
  });
});
```

Run integration tests from the project root:

```bash theme={null}
keystroke test --project integration
# or
pnpm test -- --project integration
```

Integration tests load `.env` when present and skip when required keys are unset. Vitest and its config ship with `@keystrokehq/cli` — no project `vitest.config.ts`.

## Inspect failing runs

When a test fails, or a real run misbehaves, inspect the run before changing code. From the CLI:

```bash theme={null}
keystroke workflows runs list signup-pipeline --status failed
keystroke workflows runs get signup-pipeline <run-id> --include trigger,steps,trace
```

The `steps` include shows each recorded step and where the run failed. For deployed workflows, use **History** in the web app and filter to workflow runs; the detail panel shows input, output, steps, errors, and trace data. See [workflow runs](/learn/logs/workflow-runs).

## Next steps

<CardGroup cols={2}>
  <Card title="Build workflows" href="/learn/workflows/build-workflows">
    Compose actions, agents, and durable steps.
  </Card>

  <Card title="Run workflows" href="/learn/workflows/run-workflows">
    Start runs from the CLI, triggers, the API, and agent tools.
  </Card>

  <Card title="Workflow runs" href="/learn/logs/workflow-runs">
    Debug failed runs in the web app.
  </Card>

  <Card title="Deploy a project" href="/learn/projects/deploy-a-project">
    Run tests before deploying changed workflows.
  </Card>
</CardGroup>
