Sectum AI vs Promptfoo

TL;DR. Promptfoo is the broadest open-source LLM red-team and evaluation framework — 50+ vulnerability types, 300,000+ developers, 127 Fortune 500 companies, and as of March 2026 acquired by OpenAI. Its primary unit of analysis is a prompt. Sectum AI is a multi-tenant infrastructure verifier — its unit is a tenant boundary across the AI stack, and its output is a tamper-evident, control-mapped evidence pack an auditor accepts. Use Promptfoo in your CI for fast prompt-level red-team. Use Sectum AI periodically — and at every Article 17 ticket and audit cycle — for tenant-boundary evidence.

What changed in March 2026

OpenAI announced the acquisition of Promptfoo on March 9, 2026. Co-founders Ian Webster and Michael D’Angelo joined OpenAI. The core CLI remains open source under MIT; the enterprise tier continues. The acquisition is the strongest possible validation that LLM red-team is now a strategic line item — and that the leading OSS framework belongs to the model vendor with the largest market position.

The two products

Promptfoo (promptfoo.dev)

Category: open-source LLM evaluation + red-team / vulnerability scanning framework.

License: MIT (OSS core). Commercial enterprise tier on top.

Footprint: 13.2k GitHub stars, 255 contributors, 300,000+ developers, 127 Fortune 500 companies. Used by OpenAI and Anthropic (named on the GitHub README).

Capability surface:

Sectum AI (sectum.ai)

Category: multi-tenant AI verification. Not a red-team framework. The deliverable is auditor-acceptable evidence that the tenant boundary holds across the AI stack — RFC 3161 timestamped, Sigstore Rekor logged, in-toto attested, control-mapped.

License: Apache 2.0 for the OSS core (substrate, attack catalog, adapters, evidence chain, sectum-ai verify). Hosted Sectum Cloud is commercial. The evidence layer in the OSS produces the same artifacts the hosted product does — by design.

Capability surface:

The categorical difference

PromptfooSectum AI
Unit of analysisA prompt or model outputA tenant boundary across the AI stack
Primary surfaceThe LLM endpoint (input/output)13 surfaces: API, vector DB, RAG, prompt/completion logs, semantic cache, KV cache, agent memory, MCP, fine-tunes / adapters, eval sets, backups, search indexes, tracing
MethodAdversarial input generation + judge scoringSynthetic-tenant marker substrate + manifest-grounded layered detection
DeterminismJudge-based (probabilistic)Manifest-grounded (zero false positives by construction on confirmed findings)
OutputPass/fail per check, framework-mapped reportTamper-evident evidence pack: RFC 3161 TSA, Sigstore Rekor inclusion proof, in-toto attestation envelope, control-mapped audit PDF, evidence.json
VerificationRe-run Promptfoo, trust the scoresectum-ai verify <pack> — any third party, without Sectum AI
Flagship engagementGDPR Article 17 erasure attestation
ForApplication engineering, ML platform, securityCISOs, DPOs, audit firms

The most important row is the first one: the unit of analysis. Promptfoo asks “does this prompt trip this vulnerability?” — and it asks that question 50+ different ways with great breadth. Sectum AI asks “can tenant A’s data reach tenant B across this AI infrastructure, and can I prove it?” — and it answers that question with a manifest-traceable, cryptographically-attested evidence pack. Those are different questions, and most AI shops need both.

Where Promptfoo is the right tool

Promptfoo’s leverage is enormous when you need:

If your AI surface is just an LLM endpoint and your security posture needs broad coverage of prompt-level risks, Promptfoo is the right answer. The OSS license, broad ecosystem, and the OpenAI backing make it hard to compete with on that mission.

Where Sectum AI is the right tool

Sectum AI is built for a different question, and its leverage shows when:

Surface coverage, side by side

SurfacePromptfooSectum AI
LLM endpoint (input/output)✓ (the primary unit)
RAG retrieval visible from a prompt✓ (with adapters)
Vector DB direct (cross-tenant integrity)✓ (Pinecone, pgvector, Weaviate, Chroma live adapters)
Semantic cache (cross-tenant key safety)✓ (Class 4 + live Redis adapter)
KV cache (timing side channel)✓ (Class 5, statistical Cohen’s d effect-size test)
Embedding inversion across tenants✓ (Class 6)
Agent tool calls (MCP confused-deputy + token passthrough)✓ MCP testing (red-team angle)✓ Class 7 (the Asana-class flaw with per-finding evidence)
Persistent agent memory across tenants✓ (Class 8)
LoRA / fine-tune cross-tenant influence✓ (Class 9)
Multi-turn benign extraction✓ (multi-round attacks)✓ (Class 10 — Silent Leaks / IKEA-style)
RAG poisoning✓ (red-team angle)✓ (Class 3)
GDPR Article 17 erasure verification✓ (Class 11 — the Erasure Attestation engagement)
Observability backends (Langfuse / LangSmith / Phoenix)✓ (live observability adapters)

Promptfoo is broad across LLM-prompt risk; Sectum AI is broad across the tenant boundary on the AI infrastructure. The two breadths run perpendicular.

Evidence model

Promptfoo’s compliance mapping is excellent for a red-team product — OWASP, NIST, MITRE ATLAS, EU AI Act all supported. The output is a structured report with framework references that a security engineer can read and a compliance team can file.

Sectum AI’s evidence model is categorically different:

This is a different shape of artifact. A red-team report says “we ran these probes, here’s what we found.” An evidence pack says “on this date, against this exact test condition, these findings hold, and here’s the cryptographic proof that you can verify yourself.” For an auditor or DPO, that distinction is the entire game.

Using both

In practice, AI shops that need both run both:

The two are perpendicular: Promptfoo covers depth on prompt-level risk; Sectum AI covers depth on tenant-boundary integrity. Neither replaces the other; both compound the same buyer’s posture.

Honest positioning

Promptfoo is a great product with strong distribution (OpenAI now backs it). Sectum AI is not in the LLM-red-team category — it focuses on multi-tenant verification with tamper-evident evidence, where Promptfoo doesn’t compete.

For a Promptfoo user looking for cross-tenant retrieval coverage: Promptfoo’s cross-context probes find prompt-level smoke; Sectum AI verifies infrastructure-level fire. Use Promptfoo for the smoke alarm; use Sectum AI for the fire marshal’s report.

Pricing

References


← All comparisons