promptfoo-evaluation

Agent SkillsFreeOpen Source

Configure and run LLM evaluations with Promptfoo — configs, Python assertions, llm-rubric judging.

🛡️ AgentReady threat assessment

MAESTRO 7-layer threat model + OWASP AIVSS risk score for promptfoo-evaluation, derived from its capabilities.

AIVSS 9.3 · Critical

Overview

Community Agent Skill for prompt testing with the Promptfoo framework. Generates promptfooconfig.yaml, writes Python custom assertions, implements llm-rubric LLM-as-judge, and manages few-shot examples. Runs Promptfoo and Python evaluation scripts on the host.

Key features

promptfooconfig.yaml generation
Python custom assertions
llm-rubric LLM-as-judge

Use cases

Setting up a Promptfoo eval suite
Adding LLM-as-judge assertions to prompts