Catch the prompt or model change that broke isolation.
The published research is uncomfortable: stronger embedding models leak more. The team that swaps MiniLM for mpnet-base to improve recall accidentally raises their Retrieval-Pivot Rate from 62% to 95%. The team that tightens a prompt to be more helpful accidentally surfaces a fine-tune adapter's memorised content. Sectum AI's baseline engine catches it.
Start with the OSS See engagements
How regression baselines work
- Save the baseline:
sectum-ai baseline --saverecords the current run's metrics — Retrieval-Pivot Rate, per embedding model, per-probe confirmed-finding counts, side-channel effect sizes, erasure residue per surface — intobaseline.json. - Re-run after a change: a CI step runs
sectum-ai probethensectum-ai baseline --compare. - Get the verdict: each metric is compared against the
baseline; any metric that moved in the worse direction is flagged
REGRESSED. The CI step exits2— the same exit code as a fresh confirmed finding — so existing CI gates pick it up.
What the baseline catches
Embedding model upgrades
Swapping MiniLM for a stronger model is the canonical regression. Recall goes up; Retrieval-Pivot Rate goes up with it. The baseline catches the trade-off before it ships.
Prompt or system-message rewrites
A more helpful system message can flip a probe from “refused” to “leaked.” The baseline catches the cliff.
Adapter / fine-tune rollouts
A newly-deployed LoRA adapter that memorised a canary phrase
surfaces it in the next probe run. per_probe_findings
for Class 9 jumps; the baseline flags the regression.
Cache-key changes
A semantic-cache refactor that accidentally drops the tenant scope from the key shows up as Class 4 findings appearing where there were none before.
Wire it into CI
The OSS already supports this end-to-end. A minimal GitHub Actions step:
- name: Sectum AI probe + regression check
run: |
uv run sectum-ai probe --workdir .sectum-ai --output json > probe-summary.json
uv run sectum-ai baseline --workdir .sectum-ai --compare
env:
# ... secrets resolved by sectum-ai.yaml at runtime
Exit code 0 = no regression. Exit code 2 =
a metric moved in the worse direction; CI fails the build. The
probe-summary.json captures the headline metrics for
your dashboard.
When to upgrade from the OSS
The OSS gives you the engineering CI loop. Upgrade to a hosted SKU when:
- You want scheduled runs against a long-lived environment (e.g., staging) and the evidence pack delivered each cycle without a CI trigger
-
You want baselines maintained across releases with
historical trending in a dashboard, not just a single
baseline.jsonfile - You want threshold alerting when a metric crosses a defined ceiling — surfaced to your security / on-call channel
Engagement
OSS is free under Apache-2.0 — that's the CI integration path. For continuous, managed verification with the baseline maintained across runs in a dashboard, start an engagement for a quote.