Documentation Index
Fetch the complete documentation index at: https://docs.allgoodhq.com/llms.txt
Use this file to discover all available pages before exploring further.
Validate email categorizations, data extractions, and downstream actions before they touch production traffic.
The ERM Test Suite is the QA layer for your Email Reply Management workflows. Use it to verify that incoming replies are categorized correctly, that the right fields are extracted, and that your skills behave as expected, before any of it runs against live inbound mail. Every workspace ships with a baseline suite covering the out-of-the-box categorizations. From there, you can layer in custom tests for the scenarios the baseline doesn’t cover: edge cases, custom skills your team has built, or regression checks for categorizations you’ve previously seen drift.When to use the test suite
- Before (or after) promoting a new skill to production. Confirm that any new or modified categorization handles your common reply patterns.
- After a model or prompt change. Re-run the full suite to surface regressions in previously passing categorizations.
- When debugging a misclassification. Reproduce the offending email as a test case, then iterate against it without touching live data.
- As part of a recurring health check. Run the suite on a cadence to catch drift in OOTB categorizations.
Test suite overview
The landing page surfaces the state of every test in your workspace and lets you slice the list by status or categorizationFilters
Narrow the list down to what you’re working on:Test list columns
| Column | What it shows |
|---|---|
| Name | The label you gave the test. Use a convention that scales (e.g. OOO – multi-day absence, Bounce – mailbox full). |
| Expected category | The categorization the test asserts against. |
| Checks | Whether the test asserts categorization only, or also validates extracted fields. |
| Status | Pass / fail indicator from the most recent run. |
Creating a test
Name the test and paste the reply body
Give the test a descriptive, scannable name and paste the email body you want the system to evaluate. Treat the body the way it will arrive in production — keep signatures, quoted history, and formatting intact if those are part of what makes the case representative.
Add sender, recipient, alias and subject (optional)
These fields are optional, but they’re worth filling in when the categorization could plausibly depend on them — for example, when an alias is part of the routing logic or when the sender domain is a signal.The subject does affect classification. If you’re testing a category where the subject is a strong signal (auto-replies, bounces, “Re:” threading), set it explicitly.
Want to confirm a categorization is robust? Duplicate the test with the same body but different subjects. If results diverge, you’ve found a fragility worth flagging.
Set expected category and add extraction checks (if your skill extracts data)
For skills that pull structured data out of replies, add one check per field you want to validate:
- Field name — the extracted field you’re asserting against
- Expected value — the value you expect to be returned
- Comparison type —
ExactorSemantic
Save and run
Save the test, then execute it from one of two places:
- Run All Tests at the top of the suite — useful after a model change or before a release
- The Run button on the test row — useful while iterating on a single case
Best practices
- One assertion per test where possible. If a test fails, you want to know exactly which behavior broke. Bundling six extraction checks into one case makes triage harder.
- Mirror production inputs. Real replies have signatures, disclaimers, and forwarded threads. Stripping them out makes tests pass that wouldn’t pass in the wild.
- Cover the negatives. Don’t only test the categories you expect to hit — test cases that shouldn’t match a category to make sure the skill isn’t over-firing.
- Re-run after every change. Treat the suite as your release gate: prompt edits, skill changes, and new categorizations should all be followed by a full run.