executeWorkflow() and check the result. Where a workflow includes an agent or LLM step, isolate that non-determinism so the rest of the workflow stays easy to assert.
Test workflows at the boundary you care about:
| Test style | Use when |
|---|---|
| Run tests | You want to execute the workflow and assert on its output |
| Definition tests | You want to verify the slug or input/output schema without running |
| Input validation tests | You want to confirm bad input is rejected |
| Integration tests | You want real action, agent, or LLM steps to run end to end |
@keystrokehq/cli. No project vitest.config.ts is required.
Run a workflow in a test
executeWorkflow() runs one durable pass and resolves to a result you can assert on. The result is a discriminated union: completed with output, failed with error, or suspended when the run hit a durable wait.
output like this is the most useful test.
Test durable waits
A workflow that callsctx.sleep() or ctx.hook() suspends instead of completing in one pass. executeWorkflow() returns { status: "suspended", items }, where each item’s kind is "sleep" or "hook". Assert that the run suspended where you expect:
Stub steps by seeding the event log
To test what a workflow does after an expensive step (an agent, LLM, or HTTP action) without running it, seed the durable event log. Each step is checkpointed as astep_completed event keyed by a correlation id: step:<key>#<occurrence> (#0 for the first call at that spot, #1 for the second). The key is assigned automatically from the call’s position; for unbuilt local runs it falls back to the action/agent slug (step:research-signup#0). Pre-seeding one makes the runner reuse that result instead of executing the step.
runId and the same MemoryEventLog. The stored data is the action’s output and is re-validated against its output schema, so it must be schema-valid. This still exercises the real run orchestration, only the stubbed step body is skipped.
Definition tests
Definition tests are fast and never run the workflow. Use them to catch accidental slug changes or schema edits that would break callers and triggers.input and output are Zod schemas, you can parse sample payloads against them directly.
Input validation tests
A workflow rejects input that does not match itsinput schema before run executes. Assert that bad input is refused.
Testing workflows with agent or LLM steps
A workflow that prompts an agent or callspromptLlm() is no longer fully deterministic, so assert on the parts that are stable rather than exact model text.
- Test deterministic actions separately. Move logic-heavy steps into actions and unit-test those directly, so the workflow test only has to check orchestration.
- Assert on shape, not wording. For a step that returns model text, assert the output is a non-empty string or matches a structured
outputSchema, not an exact sentence. - Guard real model runs. Tests that call a real model should skip when no provider key is available, so local and CI runs do not fail without credentials.
.env when present and skip when required keys are unset. Vitest and its config ship with @keystrokehq/cli — no project vitest.config.ts.
Inspect failing runs
When a test fails, or a real run misbehaves, inspect the run before changing code. From the CLI:steps include shows each recorded step and where the run failed. For deployed workflows, use History in the web app and filter to workflow runs; the detail panel shows input, output, steps, errors, and trace data. See workflow runs.
Next steps
Build workflows
Compose actions, agents, and durable steps.
Run workflows
Start runs from the CLI, triggers, the API, and agent tools.
Workflow runs
Debug failed runs in the web app.
Deploy a project
Run tests before deploying changed workflows.