Test workflows - Keystroke

Workflows are deterministic orchestration, so unlike agents you can usually write real assertions on their output. Run the workflow with executeWorkflow() and check the result. Where a workflow includes an agent or LLM step, isolate that non-determinism so the rest of the workflow stays easy to assert. Test workflows at the boundary you care about:

Test style	Use when
Run tests	You want to execute the workflow and assert on its output
Definition tests	You want to verify the slug or input/output schema without running
Input validation tests	You want to confirm bad input is rejected
Integration tests	You want real action, agent, or LLM steps to run end to end

Run tests from the project root:

keystroke test --project unit          # src/**/*.test.ts
keystroke test --project integration   # src/**/*.int.test.ts
pnpm test                              # same — calls keystroke test

Vitest ships with @keystrokehq/cli. No project vitest.config.ts is required.

Run a workflow in a test

executeWorkflow() runs one durable pass and resolves to a result you can assert on. The result is a discriminated union: completed with output, failed with error, or suspended when the run hit a durable wait.

import { executeWorkflow } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import greeting from "./greeting";

describe("greeting workflow", () => {
  it("returns a greeting for the input name", async () => {
    const result = await executeWorkflow(greeting, { name: "Ada" });

    expect(result).toEqual({
      status: "completed",
      output: { message: "Hello, Ada" },
    });
  });
});

For a workflow whose steps are pure or call deterministic actions, asserting the full output like this is the most useful test.

Test durable waits

A workflow that calls ctx.sleep() or ctx.hook() suspends instead of completing in one pass. executeWorkflow() returns { status: "suspended", items }, where each item’s kind is "sleep" or "hook". Assert that the run suspended where you expect:

const result = await executeWorkflow(approvalFlow, { id: "req_1" });

expect(result.status).toBe("suspended");
if (result.status === "suspended") {
  expect(result.items[0].kind).toBe("hook");
}

Stub steps by seeding the event log

To test what a workflow does after an expensive step (an agent, LLM, or HTTP action) without running it, seed the durable event log. Each step is checkpointed as a step_completed event keyed by a correlation id: step:<key>#<occurrence> (#0 for the first call at that spot, #1 for the second). The key is assigned automatically from the call’s position; for unbuilt local runs it falls back to the action/agent slug (step:research-signup#0). Pre-seeding one makes the runner reuse that result instead of executing the step.

import { executeWorkflow, MemoryEventLog } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import signupPipeline from "./signup-pipeline";

describe("signup-pipeline", () => {
  it("drafts without calling the agent", async () => {
    const eventLog = new MemoryEventLog();
    const runId = "test-run";

    await eventLog.append({
      runId,
      type: "step_completed",
      correlationId: "step:research-signup#0",
      data: { brief: "stubbed brief" },
    });

    const result = await executeWorkflow(
      signupPipeline,
      { name: "Ada", email: "ada@example.com" },
      { runId, eventLog },
    );

    expect(result.status).toBe("completed");
  });
});

Pass a fixed runId and the same MemoryEventLog. The stored data is the action’s output and is re-validated against its output schema, so it must be schema-valid. This still exercises the real run orchestration, only the stubbed step body is skipped.

Definition tests

Definition tests are fast and never run the workflow. Use them to catch accidental slug changes or schema edits that would break callers and triggers.

import { describe, expect, it } from "vitest";
import signupPipeline from "./signup-pipeline";

describe("signup-pipeline workflow", () => {
  it("keeps its slug and input contract", () => {
    expect(signupPipeline.slug).toBe("signup-pipeline");
    expect(() => signupPipeline.input.parse({ name: "Ada", email: "ada@example.com" })).not.toThrow();
  });
});

Because input and output are Zod schemas, you can parse sample payloads against them directly.

Input validation tests

A workflow rejects input that does not match its input schema before run executes. Assert that bad input is refused.

import { executeWorkflow } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import signupPipeline from "./signup-pipeline";

describe("signup-pipeline validation", () => {
  it("rejects a malformed email", async () => {
    await expect(
      executeWorkflow(signupPipeline, { name: "Ada", email: "not-an-email" }),
    ).rejects.toThrow();
  });
});

Testing workflows with agent or LLM steps

A workflow that prompts an agent or calls promptLlm() is no longer fully deterministic, so assert on the parts that are stable rather than exact model text.

Test deterministic actions separately. Move logic-heavy steps into actions and unit-test those directly, so the workflow test only has to check orchestration.
Assert on shape, not wording. For a step that returns model text, assert the output is a non-empty string or matches a structured outputSchema, not an exact sentence.
Guard real model runs. Tests that call a real model should skip when no provider key is available, so local and CI runs do not fail without credentials.

import { executeWorkflow } from "@keystrokehq/keystroke/workflow";
import { describe, expect, it } from "vitest";
import jokeFlow from "./joke-flow";

describe("joke-flow workflow", () => {
  it.skipIf(!process.env.ANTHROPIC_API_KEY)("produces a joke and a category", async () => {
    const result = await executeWorkflow(jokeFlow, {});

    expect(result.status).toBe("completed");
    if (result.status === "completed") {
      expect(result.output.joke.length).toBeGreaterThan(0);
      expect(result.output.category).toBeDefined();
    }
  });
});

Run integration tests from the project root:

keystroke test --project integration
# or
pnpm test -- --project integration

Integration tests load .env when present and skip when required keys are unset. Vitest and its config ship with @keystrokehq/cli — no project vitest.config.ts.

Inspect failing runs

When a test fails, or a real run misbehaves, inspect the run before changing code. From the CLI:

keystroke workflows runs list signup-pipeline --status failed
keystroke workflows runs get signup-pipeline <run-id> --include trigger,steps,trace

The steps include shows each recorded step and where the run failed. For deployed workflows, use History in the web app and filter to workflow runs; the detail panel shows input, output, steps, errors, and trace data. See workflow runs.

Next steps

Build workflows

Compose actions, agents, and durable steps.

Run workflows

Start runs from the CLI, triggers, the API, and agent tools.

Workflow runs

Debug failed runs in the web app.

Deploy a project

Run tests before deploying changed workflows.

​Run a workflow in a test

​Test durable waits

​Stub steps by seeding the event log

​Definition tests

​Input validation tests

​Testing workflows with agent or LLM steps

​Inspect failing runs

​Next steps