promptfoo-evaluation
Configure and run LLM evaluations with Promptfoo — configs, Python assertions, llm-rubric judging.
🛡️ AgentReady threat assessment
MAESTRO 7-layer threat model + OWASP AIVSS risk score for promptfoo-evaluation, derived from its capabilities.
AIVSS 9.3 · Critical
View MAESTRO 7-layer threat model →Overview
Community Agent Skill for prompt testing with the Promptfoo framework. Generates promptfooconfig.yaml, writes Python custom assertions, implements llm-rubric LLM-as-judge, and manages few-shot examples. Runs Promptfoo and Python evaluation scripts on the host.
Key features
- promptfooconfig.yaml generation
- Python custom assertions
- llm-rubric LLM-as-judge
Use cases
- Setting up a Promptfoo eval suite
- Adding LLM-as-judge assertions to prompts