How to Use the Test Suite

Validate email categorizations, data extractions, and downstream actions before they touch production traffic.

The ERM Test Suite is the QA layer for your Email Reply Management workflows. Use it to verify that incoming replies are categorized correctly, that the right fields are extracted, and that your skills behave as expected, before any of it runs against live inbound mail. Every workspace ships with a baseline suite covering the out-of-the-box categorizations. From there, you can layer in custom tests for the scenarios the baseline doesn’t cover: edge cases, custom skills your team has built, or regression checks for categorizations you’ve previously seen drift.

Treat the suite like a smart campaign QA checklist. Add a test the first time you encounter an edge case in production; that way the next time something similar comes through, you’ll catch it pre-deploy instead of in the inbox.

When to use the test suite

Before (or after) promoting a new skill to production. Confirm that any new or modified categorization handles your common reply patterns.
After a model or prompt change. Re-run the full suite to surface regressions in previously passing categorizations.
When debugging a misclassification. Reproduce the offending email as a test case, then iterate against it without touching live data.
As part of a recurring health check. Run the suite on a cadence to catch drift in OOTB categorizations.

Test suite overview

The landing page surfaces the state of every test in your workspace and lets you slice the list by status or categorization

Filters

Narrow the list down to what you’re working on:

Test list columns

Column	What it shows
Name	The label you gave the test. Use a convention that scales (e.g. `OOO – multi-day absence`, `Bounce – mailbox full`).
Expected category	The categorization the test asserts against.
Checks	Whether the test asserts categorization only, or also validates extracted fields.
Status	Pass / fail indicator from the most recent run.

Creating a test

Click + New Test

Open the test creation form from the test suite landing page.

Name the test and paste the reply body

Give the test a descriptive, scannable name and paste the email body you want the system to evaluate. Treat the body the way it will arrive in production — keep signatures, quoted history, and formatting intact if those are part of what makes the case representative.

Add sender, recipient, alias and subject (optional)

These fields are optional, but they’re worth filling in when the categorization could plausibly depend on them — for example, when an alias is part of the routing logic or when the sender domain is a signal.The subject does affect classification. If you’re testing a category where the subject is a strong signal (auto-replies, bounces, “Re:” threading), set it explicitly.

Want to confirm a categorization is robust? Duplicate the test with the same body but different subjects. If results diverge, you’ve found a fragility worth flagging.

Set expected category and add extraction checks (if your skill extracts data)

For skills that pull structured data out of replies, add one check per field you want to validate:

Field name — the extracted field you’re asserting against
Expected value — the value you expect to be returned
Comparison type — Exact or Semantic

Exact matches the expected value verbatim against what was extracted. Use it for structured values where wording is stable — dates, dollar amounts, order numbers, statuses.Semantic matches on meaning rather than wording. Use it for free-text fields where the model may paraphrase — reasons, intents, summarized requests. A semantic check on reason = "out of office" will pass on "away from the office until Monday".

Save and run

Save the test, then execute it from one of two places:

Run All Tests at the top of the suite — useful after a model change or before a release
The Run button on the test row — useful while iterating on a single case

Click into your tests

Review your tests by clicking into them. Edit, Rerun, Debug and Delete Tests.

Best practices

One assertion per test where possible. If a test fails, you want to know exactly which behavior broke. Bundling six extraction checks into one case makes triage harder.
Mirror production inputs. Real replies have signatures, disclaimers, and forwarded threads. Stripping them out makes tests pass that wouldn’t pass in the wild.
Cover the negatives. Don’t only test the categories you expect to hit — test cases that shouldn’t match a category to make sure the skill isn’t over-firing.
Re-run after every change. Treat the suite as your release gate: prompt edits, skill changes, and new categorizations should all be followed by a full run.

Use Cases

Deployment and Configuration

Integrations

How to Use the Test Suite

Validate email categorizations, data extractions, and downstream actions before they touch production traffic.

When to use the test suite

Test suite overview

Filters

Test list columns

Creating a test

Best practices

Use Cases

Deployment and Configuration

Integrations

Documentation Index

​Validate email categorizations, data extractions, and downstream actions before they touch production traffic.

​When to use the test suite

​Test suite overview

​Filters

​Test list columns

​Creating a test

​Best practices

Validate email categorizations, data extractions, and downstream actions before they touch production traffic.

When to use the test suite

Test suite overview

Filters

Test list columns

Creating a test

Best practices