PromptHub

© 2025 Tethered Software Inc.

|

Terms of Service

Back

🧰

LLM-as-a-judge Evaluation Prompts

⭐
Rag prompt
A concise evaluator prompt that guides the model to act as a Retrieval-Augmented Generation assistant, using provided documents to answer queries accurately and citing sources for each response, provided under the MIT license license
PromptHub
1
🌌
Criteria Compliance Evaluator
Reviews a model’s submission against the input, ground truth, and evaluation criteria, explains the reasoning, and outputs a binary “Y” or “N” to indicate whether the submission meets the specified standards, provided under the MIT license license
PromptHub
🍋
Triple-Metric Answer Evaluator
Judges a model’s answer to a question (with context) for correctness, comprehensiveness, and readability, returning binary scores for each along with one-line step-by-step justifications, provided under the MIT license license
PromptHub
🚏
Topic-Based 1-100 Evaluator
Scores a model’s output on a 1–100 scale for any specified topic, comparing it to the original input and applying user-defined criteria to produce a single fitness score, provided under the MIT license license
PromptHub
🛹
RAG document relevance
Rapidly checks a retrieved document against a user question and assigns a 1 (relevant) or 0 (irrelevant) score, using simple keyword or semantic overlap to weed out off-topic retrievals, provided under the MIT license license
PromptHub
1
🚈
Custom Criterion Scorer
Assigns a 0–100 score to a model’s output based on how well it fits the given input for a specified topic, following supplied evaluation criteria and detailing step-by-step point adjustments for transparency, provided under the MIT license license
PromptHub
🚀
RAG Passage Evaluator
Strictly grades an LLM-returned passage against a ground-truth answer for a given user query, assigning probabilistic precision and recall scores to quantify how completely and cleanly the passage covers the required information, provided under the MIT license license
PromptHub
1
🚛
Basic Evaluator, Binary
Evaluates an answer to a question, awarding 1 (meets criteria) or 0 (does not) based on relevance, conciseness, and usefulness, and provides step-by-step reasoning for the score, provided under the MIT license license
PromptHub
⚫
Task-Correctness Judge
Rates how accurately and fully the answer matches a gold reference, returning a 0-10 score with a brief justification, provided under the MIT license license
PromptHub
🌴
Binary Relevance Checker
Quickly filters retrieved chunks for RAG pipelines with a binary relevance decision, provided under the MIT license license
PromptHub
🛶
Style-Tone Auditor
Checks that generated text matches the brand’s voice and avoids forbidden wording, provided under the MIT license license
PromptHub
⚡
Toxic-Content Flagger
Binary flag for disallowed toxic language, with minimal rationale, provided under the MIT license license
PromptHub
🚎
Harm-Risk Evaluator
Produces a 0-100 risk score and highlights the riskiest excerpt, provided under the MIT license license
PromptHub
👫
Demographic Bias Judge
Labels answers for biased language or unequal treatment across groups, provided under the MIT license license
PromptHub
🛴
Factual-Grounding Verifier
Guards against hallucination by validating each claim against its cited context, provided under the MIT license license
PromptHub
🥍
Jailbreak-Resistance Judge
Detects whether a malicious prompt caused the model to break policy, provided under the MIT license license
PromptHub
🍑
Paraphrase-Consistency Evaluator
Measures answer stability across paraphrased inputs to uncover brittle reasoning, provided under the MIT license license
PromptHub
👭
Adversarial Hallucination Probe
Assigns a groundedness score after an adversarial turn and surfaces unsupported claims, provided under the MIT license license
PromptHub