Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.allgoodhq.com/llms.txt

Use this file to discover all available pages before exploring further.

Validate email categorizations, data extractions, and downstream actions before they touch production traffic.

The ERM Test Suite is the QA layer for your Email Reply Management workflows. Use it to verify that incoming replies are categorized correctly, that the right fields are extracted, and that your skills behave as expected, before any of it runs against live inbound mail. Every workspace ships with a baseline suite covering the out-of-the-box categorizations. From there, you can layer in custom tests for the scenarios the baseline doesn’t cover: edge cases, custom skills your team has built, or regression checks for categorizations you’ve previously seen drift.
Treat the suite like a smart campaign QA checklist. Add a test the first time you encounter an edge case in production; that way the next time something similar comes through, you’ll catch it pre-deploy instead of in the inbox.

When to use the test suite

  • Before (or after) promoting a new skill to production. Confirm that any new or modified categorization handles your common reply patterns.
  • After a model or prompt change. Re-run the full suite to surface regressions in previously passing categorizations.
  • When debugging a misclassification. Reproduce the offending email as a test case, then iterate against it without touching live data.
  • As part of a recurring health check. Run the suite on a cadence to catch drift in OOTB categorizations.

Test suite overview

The landing page surfaces the state of every test in your workspace and lets you slice the list by status or categorization

Filters

Narrow the list down to what you’re working on:

Test list columns

ColumnWhat it shows
NameThe label you gave the test. Use a convention that scales (e.g. OOO – multi-day absence, Bounce – mailbox full).
Expected categoryThe categorization the test asserts against.
ChecksWhether the test asserts categorization only, or also validates extracted fields.
StatusPass / fail indicator from the most recent run.

Creating a test

1

Click + New Test

Open the test creation form from the test suite landing page.
2

Name the test and paste the reply body

Give the test a descriptive, scannable name and paste the email body you want the system to evaluate. Treat the body the way it will arrive in production — keep signatures, quoted history, and formatting intact if those are part of what makes the case representative.
3

Add sender, recipient, alias and subject (optional)

These fields are optional, but they’re worth filling in when the categorization could plausibly depend on them — for example, when an alias is part of the routing logic or when the sender domain is a signal.The subject does affect classification. If you’re testing a category where the subject is a strong signal (auto-replies, bounces, “Re:” threading), set it explicitly.
Want to confirm a categorization is robust? Duplicate the test with the same body but different subjects. If results diverge, you’ve found a fragility worth flagging.
4

Set expected category and add extraction checks (if your skill extracts data)

For skills that pull structured data out of replies, add one check per field you want to validate:
  • Field name — the extracted field you’re asserting against
  • Expected value — the value you expect to be returned
  • Comparison typeExact or Semantic
Exact matches the expected value verbatim against what was extracted. Use it for structured values where wording is stable — dates, dollar amounts, order numbers, statuses.Semantic matches on meaning rather than wording. Use it for free-text fields where the model may paraphrase — reasons, intents, summarized requests. A semantic check on reason = "out of office" will pass on "away from the office until Monday".
5

Save and run

Save the test, then execute it from one of two places:
  • Run All Tests at the top of the suite — useful after a model change or before a release
  • The Run button on the test row — useful while iterating on a single case
6

Click into your tests

Review your tests by clicking into them. Edit, Rerun, Debug and Delete Tests.

Best practices

  • One assertion per test where possible. If a test fails, you want to know exactly which behavior broke. Bundling six extraction checks into one case makes triage harder.
  • Mirror production inputs. Real replies have signatures, disclaimers, and forwarded threads. Stripping them out makes tests pass that wouldn’t pass in the wild.
  • Cover the negatives. Don’t only test the categories you expect to hit — test cases that shouldn’t match a category to make sure the skill isn’t over-firing.
  • Re-run after every change. Treat the suite as your release gate: prompt edits, skill changes, and new categorizations should all be followed by a full run.